由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
HuNan版 - trading
进入HuNan版参与讨论
1 (共1页)
f******l
发帖数: 18
1
Hello, and welcome to this Free Code Camp course on algorithmic trading and
Python.
My name is Nick McCollum, I'm going to be your instructor for this course.
And in this
course, we're going to focus on building three large quantitative finance
projects. In the
first project, we are going to build an equal weight version of the popular
s&p 500 index
fund. In the second project, we are going to build a quantitative momentum
strategy
that selects the best stocks based on a variety of momentum, investing
metrics. And in the
third project, we're going to build a quantitative value screener that
select stocks that are
attractive based on a number of value metrics. So that's a quick summary of
the three projects
that we are going to build in this course. Before we move into learning more
about the
course and those projects, I have a couple of housekeeping items that have
to go over
first, to start this Free Code Camp course on algorithmic trading. And
Python is made
possible through a grant from IE x cloud. If you've ever read the book,
flash boys by
Michael Lewis, you may have heard of it. In this course, we're gonna be
using some other
tools and API's to populate some of the data that we will need for these
algorithmic trading
strategies. The other quick thing that you're going to notice starting now
is a little message
at the bottom of your screen that's going to stay there for the rest of the
video. And
what that message intends to represent is that this course is for
educational purposes
only, I'm going to be providing some high level finance concepts, but
nothing in this
course or in this video should be construed as investment advice. So just to
reiterate,
this course, is for educational purposes only. And none of this should be
considered investment
advice. So with all of that out of the way, let's dig into a little bit more
about what
this course is going to teach you. So let's start by just going over a quick
course overview.
The first thing that we're going to discuss is some basics of the field of
algorithmic
trading. Then we're going to discuss some API basics, and kind of just
quickly go over
how the course is configured and laid out. The next thing we're going to
discuss is the
first project of this course, which is an equal way version of the popular s
&p 500 index
fund. And then we're going to talk about our second project, which, as I
mentioned earlier,
is a quantitative momentum strategy. And then we're going to discuss briefly
our third project,
which is a quantitative value strategy. So that's kind of a 30,000 foot view
of what
you're going to be learning in this course. Let's zoom in a little bit and
talk about
some algorithmic trading basics. So for anyone who's completely unfamiliar
with algorithmic
trading, it basically means using computers to make investment decisions. So
in the past,
where you may have had a team of financial analysts or investment
researchers to kind
of consider stocks and decide what to buy. Many of the popular investing
strategies that
are implemented today actually use more computers than they do humans. So
algorithmic trading
just means using computers to make investment decisions. Now, there are many
different types
of algorithmic trading, on the one hand, you have, you know, your super high
frequency
trading Big Brain, really, really sophisticated complex strategies. And then
at the other
end of the spectrum, you can kind of basically map any type of fundamental
strategy that
humans have have led in the past, and just make it a little more efficient
by providing
some computer software in there. So as you know, this slide kind of alludes
to the main
difference between the many different types of algorithmic trading is
generally the speed
at which those trades are executed. Now, before we actually dig into more
about the world
of algorithmic trading, it's kind of helpful to have a high level
understanding of who
the main players are in this space. And one of the reasons I wanted to kind
of provide
this information at the start of this course is so that you can see that the
field of algorithmic
trading is very big. And if you do want to make a career that there are lots
of employers
lots of opportunities and lots of jobs in this space. So right now, I'm just
going to
briefly discuss a few of the largest players in the algorithmic trading
landscape. The
first one, which is kind of the 50,000 pound gorilla in the room is
Renaissance technologies.
I believe they're based in Long Island, and they have $165 billion in assets
under management.
Now, Renaissance technologies is mostly famous for their Medallion Fund,
which is only open
to employees at this point, which I think has returned something insane,
like 50%, a
year for 20 years or something like that. Now, none of us can access that
fund because
it's for employees only. But Renaissance technologies probably has one of
the best performing investment
funds in the history of finance, and perhaps most interestingly, is that
Renaissance technologies
was not founded by a finance major or a investment analyst to start. The
person who actually
started Renaissance technologies was a math PhD from, I believe, a
California University.
His name's Jim Simons, and many people would kind of consider him to be one
of the forefathers
of quantitative finance. So that's kind of the largest player in this space.
Another
popular one is AQR Capital Management, which was founded by Cliff Asness and
a few other
managing partners and they have $61 billion in assets under management. Now,
unlike Renaissance
technologies, AQR Capital Management actually has strategy and you're not
gonna invest it
through mutual funds and other types of strategies. And one last interesting
thing about that
firm is that their name is kind of the IBM of investment firms where IBM is
International
Business Machines a very bland name, AQR Capital Management, the AQR
component of that name
actually stands for Applied quantitative research. So if you've ever curious
what that stands
for, That's what it is. The last one is Citadel securities, which has $32
billion in assets
under management. And they're a much more high frequency trading firm. And
the first
two that I mentioned, kind of famous for their role in the market making
space. So they were
founded by Ken Griffin. And that kind of summarizes three of the large
players, you can tell by
just the assets under management of these three firms alone, more than 200
billion,
that there are lots of opportunities and jobs. And this is a very large
space. So definitely
something worth learning a bit about. Now, in this course, as you probably
have can tell
by the title, we are going to be using Python for these algorithmic trading
strategies.
Now, before we begin actually how to use Python and how to write the code
for it, there's
a few high level things you should understand. The first one is that Python
is probably the
most popular programming language for algorithmic trading. And the reason
why is just because
it has a lot of libraries. But the downside to using Python is that it's a
fairly slow
programming language. So if you want to execute high performance code,
Python is typically
not your best bet. Now, the solution that many practitioners have found for
this is
that Python is often used as a glue language to trigger code that actually
runs in other
languages. A common example of this is the NumPy library for Python. And
that's actually
a library that we'll be using in this course. Now, NumPy is the most popular
Python library
for performing numerical computing. And it's perhaps most popular for its
data structure
called the NumPy array, which easily allows you to store and manipulate one
or two dimensional
data structures in Python. Now, although NumPy is actually a Python library,
and it is called
and manipulated using the Python programming language, the core underlying
functionality
is written in C, which is a much faster language and provide lots of
performance enhancements
to Python users who use the NumPy library in their code. So just to quickly
summarize,
the main idea of this section is that Python is the most popular programming
language for
quantitative finance. But it's also a slow language. So a lot of times you
will use Python
to trigger functionality that actually runs in other programming languages
that are faster.
So now let's quickly talk about the algorithmic trading process can kind of
be broken down
into the following steps, the first step is to collect data, the second step
is to develop
a hypothesis for a strategy. The third step step is to back test that
strategy. Now, back
testing just means formulating your strategy, and then seeing how it would
have performed
historically over time. And generally, there's kind of two ways you want to
do that you want
to take the strategy back as far back in time as you can, and across as many
markets as
you can. So to provide a quick example, let's say that you have a hypothesis
that the largest
firms outperform, you would probably want to test that here in the United
States, and
go back as far back in time as you can. And then you would also want to test
the performance
of the largest firms in Canada, Europe, Japan, China, India, all the other
international
markets as well. And if that strategy performs well in all of those markets,
then you can
be sure that you're probably on to something. Now, the fourth strategy would
be to implement
that we're sorry, the fourth step rather, would be to implement that
strategy in production.
So you've collected data, you develop a hypothesis, you've back testing
strategy, and now you're
going to start actually trading that strategy with real money. In real
accounts moving forward
and seeing how it does with real money in the strategy. Now you have a high
level understanding
of the algorithmic trading process, there's a few ways that this course will
be different
from a real algorithmic trading strategy. And I just want to highlight those
quickly
right now before we dive into some information about the three projects we'
ll be building.
Now, just to quickly summarize the three major ways that this course will be
different. The
first is that we'll be using random data, the data provider that we're using
for this
course, which is is cloud is a paid API. So if you want to use real data,
you can follow
the steps in this course, but actually pay for an account and use a real API
token. What
we'll be using instead is we will be using a sandbox API token which
provides random
data, but it's also free. So what that means is you can you can do all the
normal things
that you can with the IRS code API, but the data will be randomly generated.
So we may
see some interesting results because of that. The second thing that is
different is that
we will not actually be executing trades in this course. Now, the reason why
that is,
is because many of the people who take this course will be using different
custodians
to execute their trades. And each one of those custodians will have a
different API infrastructure.
So I wanted to make this course as widely applicable as possible. So what we
're going
to be doing instead is we're going to be imagining that we work at a firm
where our traders expect
us to generate Excel documents and send them to them to execute trades. So
the output of
each of these three strategies will be an Excel document that has the the
name of each
companies that need to be purchased and the number of shares of each one of
those companies
that the trader needs to purchase. So we won't be executing trades, but
instead, we'll we'll
be generating order sheets to send to our imaginary traders further down the
line. And
just to reiterate that second point, we're gonna be saving these recommended
tricks to
Excel files, which could then be sent to the traders afterwards. Alright, so
now that we
have a bit of a high level understanding of the algorithmic trading
landscape, it's time
to talk about API basics. configuration. So, to start, let's discuss what is
an API. API
is an application programming interface. And you probably aren't familiar
with that if
you've ever heard the term API before, but that doesn't actually tell you
what it does.
I think the easiest way to understand what an API is, is, it's a way for
your software
to interact with and potentially control someone else's software. So in this
case, we're gonna
be using the IE x cloud API. And what we're going to do is use their API to
access their
database of financial data and import it into our Python script. So like I
say, API's allow
you to interact with someone else's software using your own code. And in
this course, we're
going to use the IRS code API to gather stock market data to make investment
decisions.
And to provide you a quick example of what that looks like. Here's a code
snippet. So
this is a cell from a Jupyter Notebook that has four lines, you can see on
the first line
that we declare a symbol AAPL, and that is the stock symbol for Apple on the
NASDAQ exchange.
On the second line, we're creating a string called API URL that has some
interpolated
values that interpolate in the symbol and our AI x cloud API token token.
And on the
third line, we're using the requests library of Python to provide a GET
requests to that
API URL. And then, on the last line, we're just printing the data that gets
returned.
Now, this is kind of a lot. And if you don't fully understand this, not to
worry, you will
see this in a lot more detail when we actually build the projects for this
course. So that's
kind of what an API call looks like. And, broadly speaking, in this course,
we're only
going to be using GET requests to gather data from the IRS code API. Now, if
you've never
used an API before, there's many different ways you can interact with them.
But there's
four major ones. And I'm going to show you briefly some of the other ways to
interact
with an API right now before we proceed through this course. So get requests
allow you to
get data from the API. And that's what we'll be using in this course. An
alternative is
a POST request. And that is a method for adding data to the database that's
exposed by the
API. So it gives a request allows you to get data from the API and a POST
request allows
you to push data to the API. a pull request allows you to add and overwrite
data in the
database that's exposed by the API. Now you can think of a POST request as a
create only
request. And you can think of a put request as a create or replace request.
So there's
a slight difference there. And but even though it's a small difference, it's
an important
difference, because that's kind of a common source of bugs for software
developers. And
the fourth kind of important type of API call is a delete, which, as it name
implies, it
deletes data from the API's database. So to provide a quick summary of this
section, we're
going to be using exclusively GET requests in our course. But there's three
main types
of requests that you should understand those are post, put, and delete. Now,
like most
software concepts, API's are best learned through rigorous practice. Because
of that,
here is a URL that you can use to get a long list of public API's that are
great practice
for you to interact with API's. Now, I would probably recommend if you've
never used an
API before to proceed through this course, that will give you a base
fundamental understanding
of how to work with API's. And then once that course is done, you can go to
this public
URL and practice with some other API's that are of interest to you
personally. So this
is not something for you to look at right now. But it's a good resource for
you to head
later on. Alright, so now that you have a high level understanding of the
course configuration
and some API basics, let's move on to talking about our equal way s&p 500
fund. So if you've
never heard of it before, the s&p 500 is the world's most popular stock
market index. And
many investment funds are benchmarked to the s&p 500. That means that they
seek to replicate
the performance of this index by owning all of the stocks that are held in
the index.
Now, if you've never really researched how stocks are added to the s&p 500,
there's a
bit of a there's a committee process and some some sophistication there. But
the easiest
way to understand it is that the s&p 500 broadly covers the 500 largest
companies in the United
States. So if you own the s&p 500 index fund, you basically own the 500
largest companies
in the US. Now aside from the fact that it owns the 500 largest companies,
one of the
other most important characteristics of the s&p 500 is that it is market
capitalization
weighted. Now, market capitalization is just a finance term, that means
basically size.
So coming up with a large market capitalization just means that it's a large
company. And
what this means is that larger companies get a correspondingly larger weight
in the s&p
500 index. For the first product of this course, we're going to build an
alternative version
of the s&p 500 index fund that provide that, that assigns the same way to
each company
in the index. So instead of Apple having a very big way and best by having a
small way,
every company will have the same weight. So that's going to be the first
project of this
course. And the second project we are going to build a quantitative momentum
screener.
Now, momentum investing means investing in assets that have increased in
price the most.
To help you understand let's kind of go through a quick example. Imagine
that you have the
choice between investing in two stocks that have had the following returns
over the last
year. Apple has stock has gone up 35% and Microsoft stock has gone up 20%. A
momentum
investing strategy would suggest investing in Apple because of its higher
recent price
return. There are many other nuances to momentum strategies, including the
concept of high
quality momentum. And that basically means all else being equal, you would
want a stock
that has steadily increasing price rather than no price change for a while,
and then
a large jump at the end. But we'll explore those when we actually build our
quantitative
momentum strategy later in this course. The third project we're going to
build is a quantitative
value screener. Now, value investing just means investing in stocks that are
trading
below their perceived intrinsic value. So if you kind of want to make a very
simple
analogy, value investing is kind of the idea of buying $1 for 75 cents, and
hoping that
you can sell it again for $1 later. So value investing is a very popular
investing strategy
because many of the world's best historical investors like Warren Buffett,
Seth Klarman,
and Benjamin Graham have employed the strategy. Creating algorithmic trading
investing strategies
relies on a concept called multiples and multiples are simply a way that
investors use to estimate
how valuable a company is. So to be more specific multiples are calculated
by dividing a company's
stock price by some measures of the company's worth, like earnings or assets
. So three examples
of multiples that are used in value investing are the price to earnings
ratio, the price
to book ratio, and the price to free cash flow ratio. Now, as the name
implies, the
price to earnings ratio is calculated by dividing a company's stock price
bytes earnings per
share, the price to book value ratio is calculated by dividing a company's
stock price by its
book value per share. And the price to free cash flow ratio is calculated by
dividing
a company's stock price by its free cash flow per share. So lots of
different ways you can
calculate multiples, but those are kind of three, three of the most
important ones Anyway,
now, each of the individual multiples to us by value investors has its pros
and cons.
One way to minimize the impact of any specific multiple is by using what's
called a composite.
Now, a composite is just a average of many different valuation strategies.
So in our
strategy that we build in this course, we're going to actually be using a
composite of
five different value investing metrics. Now, with all that out of the way,
you have a solid
understanding of what we're going to be doing in this course. And it's time
for us to dig
into the first project. So without further ado, let's do just that. Alright,
so it's
now time for us to start tackling our first project. If you've skipped ahead
to this section
of this course, there's a couple of things that you should do before
proceeding. So the
first is to rewind back to the last section and install all of your
dependencies. If you're
a pretty experienced Python developer, the main dependencies for this course
are NumPy,
pandas and XLS x writer. So if you already have those on your machine, that'
s fine. If
you've already been following throughout this video so far, I'm in our
algorithmic trading
Python folder that we cloned our local computer, I have my virtual
environment activated. And
what I'm going to do now is launch the Jupyter Notebook for our first
project. So to do that
the command is Python dash M and then notebook. So this should open up a
browser window. And
what we want to do is navigate into the starter Files folder of the
algorithmic trading repository,
and then open up the equal wait s&p 500 dot i py MB notebook, it should look
like this.
So basically, this is what we're going to be working through in this video.
It has some
instructions written in markdown and some blank code cells. And that's what
we're going
to complete as we work through this section. So let's start with just some
background information.
What we're going to do in this project is to build an equal weight version
of the s&p
500 index fund. The s&p 500 is an equal way or not an equal weight, market
weight, market
cap weighted basket of the 500 largest companies in the United States.
Interestingly enough,
it actually has more than 500 stocks in it because five or so of the largest
companies
in the US actually have multiple share classes. So an example of that is
Google it has a Class
A and A Class C share that trade on the stock exchange. So since alphabet,
the company is
included in the s&p 500, both of its stocks are and because of that, there's
usually around
505 stocks in the s&p 500, despite the index only representing 500 companies
. So anyways,
to move on, what we're going to do now is create a different version of the
s&p 500
index fund that doesn't weight the stocks in the index by market
capitalization, but
instead equal weights them so what this means is that the larger companies
in the index
will have less weight than the traditional version. And the smaller
companies in the
index will have a larger way than the traditional version. So there's some
backup information
here just says that the largest one that's benchmarked to this index is the
spdr s&p
500 ETF trust that's commonly known as the ticker spy spy, which is the
ticker that it
trades on on the stock exchange. And that ETF has more than $250 billion
dollars of
assets under management, which is pretty crazy. Most asset management firms
don't have that
much. Hmm. So a really big fun to be sure. And it says here The goal of this
section
of the course is to create a Python script that will accept the value of
your money.
portfolio and tell you how many shares of each s&p 500 constituent that you
need to
purchase to get an equal wave version of the index funds. So the first thing
we need to
do is, you know, like most Python scripts, this project will rely on a
number of open
source software libraries. So we're going to import those first, we install
them to
our local computer in the dependency section of this course. And now we just
need to import
them. And we're going to go through them one by one here, and I'll briefly
explain what
each of these libraries does in case you haven't haven't worked with them in
the past. Alright,
so the first library we're going to import is NumPy. And the command to do
that is important
NumPy. Now NumPy as NP now, NumPy is a numerical computing library. And it's
known for its
fast actually execution speed. And the reason why NumPy is so fast is
because it's actually
a C, or maybe c++ module. So when you call a Python function in NumPy, it
actually executes
the code in a different programming language. And that programming language
is kind of faster
by design. So a NumPy is often used in finance and other applications to
speed up, you know,
basic functions like summing or multiplication, because it has modules
running in a different
programming language that is naturally faster. So that's NumPy. We're also
going to import
pandas, as PD. And pandas is a portmanteau for panel data, or panel data is
I guess,
because it has an S on it. And what pandas does is it makes it very easy to
work with
tabular data in Python. So tabular data is anything that has rows and
columns, you can
think of an Excel spreadsheet as perhaps one of the most commonly known
versions of tabular
data. And pandas is most widely use for its data structure called the panda'
s data frame.
And a data frame is just a data structure that holds tabular data. So pandas
allows
you to store data in a data frame, and then use many different built in
pandas functions
and methods to manipulate the data within that data frame. You'll see lots
of examples
about this as you work through this or this course. And if you're interested
in learning
more about pandas Free Code Camp has an excellent YouTube video on the topic
that I would highly
recommend. So that's pandas. And then there's a couple others we need as
well. So actually,
before we move on, one thing that's worth talking about is these aliases, so
we didn't
just import NumPy, we imported NumPy as NP. And we didn't just import pandas
, but we imported
pandas as PD. So why do we do that? This just save us a bit of typing
because we're going
to be often calling functions from these libraries. So an example would be,
let's just run this.
So those imports run and then create a new code. So if we wanted to create a
panda's
data frame that, you know, will just create an empty one. For an example, if
we had just
imported pandas without importing it as PD, we would have to write panda's
dot data frame.
And this call is the class that instantiates A panda's data frame object.
Now, because
we imported it under the alias PD, we can just do PD dot data frame and it's
a little
bit more readable and faster to type. So it's not necessary to import the
under aliases.
I know a few developers who just run pandas DataFrame. But importing pandas
under the
alias PD and NumPy under the alias NP is pretty common and kind of widely
considered to be
a best practice. So that's what we're going to follow in this course.
Alright, let's move
on to importing the other libraries we need. So the next one is requests.
And requests
is a very popular Python library that it's kind of considered the gold
standard for making
HTTP requests. Now, an HTTP request is just basically an internet request
that you can
send to an API to get back some data. And in this course, we're going to be
using the
requests library to execute our API calls to the IE x cloud API to I guess,
receive
our stock market data that we need to calculate the weightings for s&p 500
index funds, so
the request libraries for HTTP requests. And that's basically all you need
to know about
it for now. The next thing we need to import is XLS. s writer, this is a
library that makes
it very easy to save well formatted documents, Excel documents from a Python
script. So that's
XLS x writer. And the last library we need is math. So math is just a basic
Python library
that provides many of the basic mathematical functions that you need to, I
guess, execute
operations within Python scripts. Alright, so once you run that code, all of
your libraries
will be imported. And the next thing that we need to do is to import our
list of stocks.
Now, like I said, this is going to be a list of about 500 stocks, maybe a
little more if
there's any stocks that are dual listed in the s&p 501 thing to note about
this project
is that since these constituents change over time, in an ideal world, you'd
actually connect
right to an Index Provider or an API, financial data Data API or something
like that. So that
if a stock is added or removed from the s&p 500, then your equal weight s&p
500 strategy
would reflect that now all of the tools that would provide that information
are paid and
this is a free course. So I didn't want to provide any paid resources in the
free course
that kind of defeats the point of it being free. So what, what I did instead
was I saved
a list of the 500 stocks in the s&p 500 into a CSV file, and you can click
this link here
to download that file. So what we need to do is click that, it'll go into
our Downloads
folder, and then we need to move that file into our starter files, full
starter files,
excuse me, we need to move this file into our starter Files folder so that
it can be
accessed by the other files in that directory. So let's do that now. And
then once that's
done, we can import into our Jupyter Notebook. So open up your finder app,
or I guess it
would be finder app or Windows Explorer or whatever app you have on your
operating system
to explore files and navigate to your downloads folder in one tab. And then
in your other
tab, you're going to want to navigate into that algorithmic trading and
Python course.
So here it is. And then specifically, you want to navigate into the starter
file folder
and then drag this s&p 500 stocks dot CSV into the starter Files folder like
that. Okay,
so now that that's done, what we need to do is save the s&p 500 stocks as a
panda's data
frame. And the way to do that is we're going to assign it to a panda's data
frame named
stocks. So stocks equals k dot read CSV, and then the name of the file. So
if you just
type in SP, and then tab, it might autocomplete there it is. And if it doesn
't autocomplete,
just type in the file name. So what is this, it is a this is a method that
lives within
the pandas library that takes in data in the form of a CSV, and then stores
it into a panda's
data frame. So you can verify that by just doing type of stocks, and this
should return
panda's data frame. Yep. And then if you print it out, it'll actually show
the data. So Jupyter
Notebooks have this kind of special feature where if you type a variable on
the last line
of a code cell, it will print it. So this is what is contained in our, in
our panda's
data frame. Like I said, there's actually 505 stocks within this data frame,
which means
that there's five stocks that have dual shared share structures in the s&p
500 at this time.
So that's what we're going to be working with throughout the rest of this
tutorial. The
next thing we need to do is acquire an API token, like I mentioned in the
introduction
to this course, we're going to be using the IE x cloud API to gather all of
our financial
data. And like most API's, they require some form of authentification before
you can pull
data from it. So what we're actually gonna be using in this course, is the
sandbox mode
of the IE x cloud API. And sandbox mode is a kind of like a play mode, where
you can
make sure that all of your API calls work, but it doesn't actually return
real time financial
data. Instead, it returns randomized financial data. And the reason why
sandbox mode exists
is because it allows you to test whether or not your API calls function
properly before
you actually start to use the real API, and incur data usage as a result. So
the easiest
way to to handle this, in my opinion, when building this course, was to
create a secrets.py
file and secrets.py files are files that are stored in your repository or on
the server
that's running this code that doesn't actually get pushed to GitHub or any
kind of remote
Git store. So the reason why is because most of the information stored in a
secret.py file
is sensitive or confidential in some way, and you just don't want that type
of information
stored in a centralized Git repository. So you'll notice in the Git ignore
of this course,
the secrets.py file is actually included in the Git ignore, which means that
if you ever
fork this repository to work on it, and then push your changes up to GitHub,
the information
contained in the secrets.py file doesn't actually get pushed up as well. So
since it's actually
a sandbox API key, it doesn't really matter in this specific instance,
whether the secrets.py
file is exposed. But keep in mind for future Python courses are anything
else that you
work on that you should never share your secrets.py file with anyone. But
this course is kind
of an exception, since we're working with a sandbox API key. So to start, we
're going
to click here. And this will download a secrets.py file to your downloads
folder. Now, you may
get a warning that says something like this, in my case, it says this type
of file can
harm your computer do you want to keep secrets.pi anyway, and we're going to
keep it in and
move it into the same starter Files folder that we moved our CSV file of
stocked into.
So if you open back up your finder, app or whatever app that you have on
your operating
system, you should have your two tabs still open, you can click and drag the
secrets.py
file into the starter Files folder. And then we need to import our IE x
cloud API key into
our script. So what we're going to do for that is we're going to say from
secrets, import,
ie x, and then if you hit tab, it should autocomplete. It says cannot import
I yet have API token.
From secrets. I wonder where that is. usually done. Okay, so this is equally
as. Alright,
so sometimes what happens with these Jupyter Notebooks is that when you add
a file to the
working directory that the Jupyter Notebook is opening, it actually doesn't
recognize
that it's there until you reset the kernel. So let's try that we're going to
say, restart
the kernel. And then we're going to go back to the top and run all of our
code cells from
the start. Awesome. And this time it imported correctly. So in a lot of
cases, I might run
into little hiccups like that when working through this code. And in some
courses, people
edit it out for a more clean experience. But I would say in the majority of
cases, I'm
going to leave them in so that you can see the debugging process and what I
actually
do to fix the problems that I encounter. So that was a small example of that
. You'll see
more as you work through this course. So now we're going to make our first
API call. So
making API calls is a bit of a, an art because every API works a little bit
differently.
And some API's have lackluster documentation. In this case, the i x cloud
API has excellent
documentation. And I'm going to kind of show you a bit of the documentation
as we work
through this section that creates our first API call. So in this specific
case, for the
s&p 500 index fund, there's two things we need, we need the market
capitalization for
each stock and the price of each stock. So to start, we're just going to do
an API call
for one single stock. And then, to generalize that, for our entire universe
of stocks, what
we're going to do is loop over every stock that's contained in the panda's
data frame
that we created earlier, and then run that same API, call it for each one of
those stocks
using a Python for loop. So to start, let's just create symbol equals. And
we're going
to make this a string that is Apple's topic, which is AAPL. And then on the
next line,
we're going to say, what is the API URL? And that's a good question. What
API endpoint
is that, that's what we're gonna find out. Next. To do that, we're going to
navigate
over to the IX, cog documentations. And the easiest way to find this is just
go to is
cloud. Doc's on a Google search. And it's this first hit here, that is not
an ad. So
click through to this. And what this is, is basically a massive single page
resource for
all the possible information that you might need about the cloud API. ie x
cloud has remarkably
good documentation. There are other API's out there that maybe aren't as
commercialized
or are not as mature that won't have documentation that that's this good. So
as you kind of look
through this, keep in mind that this is probably a better than average
documentation page for
an public API. Now, the first thing we need is the base URL for the API. And
what the
base URL is, is it's basically a URL that will start every HTTP request. And
then after
that base URL, you have to add which specific endpoint that you want to
retrieve from the
API. The way most API's work is they only expose certain data through each
endpoint,
which makes things faster if you only need to retrieve certain data. So one
example might
be, if you have the Google Maps API, you might only want the name of a
location. So you would
send coordinates and get back its name, you wouldn't get back other
information like population
or other sorts of things because you only want name and sending only that
limited information
is faster than sending information you wouldn't meet. So API endpoints are
limited in that
respect. But to start, the first thing we did is the base URL. Now, you can
see in this
first block of text called API reference that the base URL for the API is
HTTPS for slash
cloud.ie. x API comm we're actually using the sandbox mode of this API in
this course,
which means that we're not charged for any usage. But we get randomized that
In exchange,
so you could actually use the sandbox mode to create real investment
strategies, but
it's free. And it's still a good way to learn about how data API's work in
general. But
because of that, we can't use this base URL, what we need to do is a Ctrl F
for sandbox.
And if we kind of search through the first little bit, we find this section
called testing
sandbox. It says this is cloud provides all accounts a free unlimited use
sandbox for
testing, every account will be assigned to test tokens available via the
console, all
sandbox API, or sorry, all sandbox endpoints function the same as production
, you only
need to change the base URL and token. So what does that mean, we need to
change the
base URL, which means we need to grab this instead of the base URL that was
provided
earlier. So copy that to your clipboard, move back to our Jupyter Notebook
and paste that
right there. Awesome. So we have our base URL. The next thing we need to do
is figure
out which endpoint we need. And as you can see from looking at this, we
ideally want
to find an endpoint that provides both market capitalization and stock price
. There is one
API endpoint provided by AI x cloud that provides both of those, but I'm
going to show you first
how you would find just any endpoint that provides one of those metrics and
then I'll
show you the input we're actually going to use so let's start with market
capitalization.
Note that is cloud does have this search bar over here in the top left, but
for some reason,
I always just find myself using a Ctrl F instead. So what we're gonna do
here is do market cap
or let's just do market cap not market capitalization because it's sometimes
abbreviated. So what
do we have here for the first its enterprise value and price to sales. So
these are both
metrics that are calculated using market capitalization, they don't actually
give us market capitalization.
So neither of those is what we need. The next one is actually a true market
cap metric.
And this is exactly what we need. But if you look through the rest of the
response attributes
from this API endpoint, this doesn't actually give you stock price. So we
like I said, we
ideally want to find an endpoint that has both price and market cap. And
with a bit
of searching, you would find that the, quote endpoint provides both of those
. So I just
did a Ctrl F for for slash, quote, forward slash. And this takes us to the
information
section about the quote endpoint. Now, every endpoint description on the IE
x cloud API
starts with a section on how to execute the, the HTTP request for that
endpoint. So in
this case, we want to execute a get HTTP request. And the endpoint is for
slash stock, and then
symbol and then forward slash quote. So if we just copy this, and add this
to the API
endpoint that we have in our Jupyter Notebook, this actually won't work.
Because this, you
know, curly brackets symbol thing here doesn't mean anything in the context
of this string.
If we tried to execute this HTTP request as it is, this will return a 400
error probably.
So what we need to do is we need to transform this API URL into an F string.
Now, an F string,
if you've never used them before, I'll give you a very brief introduction.
So if you have
string equals Free Code, camp is awesome. And then you said I need to change
the values
within this string based on some outside variable. We will say we'll create
a variable called
adjective. And this just describes a word that we want to call Free Code
Camp, we will
say the adjective is superb instead of awesome. And then we want to pass
that adjective variable
where this awesome word is within within the strength. So the way to do that
is you create
an F string, now, as strings are called that strings, because you simply put
an F on the
start of the string. And then this allows you to interpolate values in so
here, curly
brackets, and then we type adjective. Now, what this string actually stores
is Free Code
Camp is superb. And you can test that by printing the string. Free Code Camp
is super now we're
going to use this string functionality to create our API endpoint. And you'
re sorry,
yeah, no, to create our API URL. So we're going to do is we're going to cut
this and
just get rid of it. And then here, this already has the curly brackets that
are associated
with an F string. So all we need to do. And actually, in addition to the
curly brackets,
it has the right variable name that we called here, I think I did that by
design, well,
kind of building the code for this course. So what we're doing is adding an
F there.
And then if we print this oops, if we print this API URL, this should print
out this string.
But instead of symbol, it should say a PL all caps. Perfect. That's exactly
what we
need. Okay. The last thing we need to do is we need to add an appendage to
the end of
this API URL that that passes in our API token to authenticate us through
the i x cloud API,
and basically say, this user does have permission to access this API
endpoint. Now the way to
do that is you say, question mark token equals, and then we're going to use
an F string again,
and say, is cloud token that we imported earlier, if you just type in all
caps, ie x and then
hit tab, this will autocomplete for you. Now, if we print this, oh, sorry, I
didn't actually
have a print statement there. If we print this. This gives us a fully
fledged AI x cloud
endpoint that we can pay using the requests library. So moving on to that,
what we need
to do now is execute an HTTP request and store the results of that HTTP
request in some outside
variable, we're going to call that variable data. And what we need to assign
it to is
a requests dot get method. Now, if you've never used the requests library,
that's what
we imported at the top of our script. Here. It's kind of known as the world'
s best Python
library for executing HTTP requests. Their tagline is, their tagline is HTTP
for humans.
And I quite like that because it does describe Well, what they do. So what
we need to do
here is just type in request dot get and then pass in our API URL. Now, if
we run that,
what the heck is this data variable? The easiest way to check is to just do
the pipe function
and pass in data. And what it is is it's a response object, which is
contained within
the LA models, module of the requests library. Now, this response object has
many different
things inside of it. And one of the more interesting ones is the status code
. So we can do bank
data dot status code, and I'm gonna hit tab and see if that autocompletes
okay didn't
set out, you know, there it is. And this gives us a 404. Now, why the heck
does it give us
a 404? Clearly I did something wrong here. Okay, so I just figured it out
the error.
And what I missed here was, there's an additional kind of suffix that needs
to get added to
the i x cloud endpoint, called forward slash stable. Now, why does that
exist? If you look
at this section, here, there's two different naming conventions they can use
, you can use
stable or latest stable is kind of the latest stable API version. As you can
see here, well,
latest, you can access the latest API version, which may be in beta. So
basically, stable
is what you want to use if you want to make sure that your application never
breaks. And
latest is what you want to use. If you want access to the most bleeding edge
tech that
might not be fully tested yet. So all we need to do is go back here, and
here type for slash
stable for slash, and then run this. Now we get a response 200, air response
200 code,
I mean, which means that the HTTP request was executed properly. And I
actually don't
like this format. It's not as accessible as this status code attribute that
we can access.
And the reason I like this better is you can say, you can test it using
equality operator.
So is this equal to 200? True. And that's kind of an easy way to handle
exceptions with
HTTP. But if you say, if it's not equal to 200, oops, you could say if it's
not equal
to 200, then do something. Anyways. What we need to do now is this object is
not in any
format that we can really access. So what we're going to do is we're going
to transform
the data that this HTTP request returned to us into a JSON object, using the
dot JSON
method, it actually accept no parameters. So that's all we need to do. Now,
if we print
the data object or the data variable, it gives us a long JSON type kind of
Python dictionary
object that we can use to kind of parse objects out. Now, if you've never
worked much with
Python dictionaries, they're very useful. I'll make a very quick example
here. dictionary
equals now dictionaries are always kind of created using curly brackets. And
what you
can say is say A equals sorry, a colon one, and B, colon two. So this is a
Python dictionary
called dictionary, very creative naming, the reason why dictionaries are
useless because
you can pass in. So these entries here are called key value pairs. This is
the key this
a and this is the value one. The reason why dictionaries are useful is
because you can
pass in the key and it returns the value. Now, this data variable will
behave exactly
the same way. So as an example, if you want to access this, so this is one
key value pair.
And if you want to access the value, which is AAPL, all you have to do is
pass in the
key, which is symbol in square brackets like that, and this should return an
apple perfect.
So the two attributes that we wanted of this our market capitalization and
price. So we're
going to parse that in the next section of this course. Okay, so what we're
gonna do
here is say price equals data. And then where is the price contained in this
, the easiest
way to tell it to do a Ctrl F. And you can see there's a whole bunch of
different price
variables here. The one we want is probably not calculation price, latest
price sounds
right. So that's the latest price provided by the IETF party API. Now, if
you ever see
something like this, and you say, Hmm, I don't know what latest price means.
That's kind
of a bad example. But let's look at something else in here. That's not as
obvious. All right,
this variable extended change. Now what the heck is extended change the is
called documentation.
So that should tell you so if you just copy that extended change string to
your clipboard,
and then go back to the IE x cloud docs, run a Ctrl F search and type in
extended change,
it will take you not just to this JSON object, but there should be a
description below. So
as you can see here, extended change refers to the price change between
extended price
and latest price. So without knowing what those two means this is kind of
meaningless.
But this was just an example to show you how you can look up the meaning of
different data
points within the JSON response in the i x cloud docs. Okay, so that's how
to parse price.
We can print it out and see what the actual prices 515 28. Now, I actually
am not sure
if the is cloud price data is randomized in our sandbox mode, you can test
it pretty easily
by just doing NASDAQ Apple on Google search for 97. Did it ever get to 515?
Okay, so I
don't think that this price is accurate, but it's actually not far off from
the real world
price. So this price data is randomized, but it's interestingly, quite close
to the real
price. So that's how you price out price. And then for market cap, we'll
just say market
cap, Eagles data, and then market cap. And similarly, you can print market
cap. Awesome.
This gives like a, I think this is I'm gonna divide this by a trillion. That
's sorry, that's
only a billion. That's a trillion. So this gives us a market cap of 2.1 8
trillion, which
again, I think is pretty close to Apple's real market capitalization. Yeah,
so they're
at 2.1 3 trillion. So in both cases, pretty similar, although, still not
exactly correct.
Okay, so that's how you parse a API call. What we're going to do now is we'
re basically
going to scale up that process to all of the stocks within our s&p 500. csv,
and then store
all of the responses in a panda's data frame that we will later save into an
Excel file.
So that's what's up next. Alright, so the first thing we need to do is
actually, we're
going to exit out of this Ctrl F box here, because it was lighting that up
and causing
me some eggs. The first thing we need to do is specify the columns of our
panda's data
frame. And to do that, we're going to need to know exactly what we're
building. So we're
going to build a panda's data frame that has a few columns, it's going to
have the ticker
for each stock, it's going to have the stock price for each stock is going
to have the
market capitalization of each stock, and then it's going to have the number
of shares to
buy for each stock. So let's start by specifying that in a variable called
my columns, and
this is just going to be a Python list. And the entries are going to be
thicker. stock
price, market capitalization, hopefully, I can spell everything right.
number of shares
to buy. Okay, so that's the Python list. And then we're going to do now is
create a blank
panda's data frame that has those columns specified. So to do that, we're
going to call
this variable final data frame, because we're going to this will be the
final output of
this program at the end. And to create a panda's data frame, you use the PD
dot data frame,
class instantiation. And inside of this, we're going to pass in columns
equals my columns.
Now, you notice that since this is columns, I didn't want to call this
columns, because
columns equals columns is a bit confusing, although I think it would work.
Let's try
it out. So that does work. I just think it's quite a bit easier to read. If
you say caught
my columns, it kind of removes that repetition. Okay, so our panda's data
frame has been created.
What does it look like? pandas, data frames are just kind of two ways to
print them. As
with most things in a Jupyter Notebook, you can either use a formal print
statement like
that, or you can just list the variable name as the last line of code. So
this is kind
of what it looks like. There's no data in here right now. So it's kind of
empty. But
what we can do is just do create a list of a list, that's going to be 0000.
And this
is kind of how pandas dataframes typically render. So one of the advantages
of not using
a formal print statement is that when you just list the data frame variable
name on
the last line of a code sale, it will render in this nice format that kind
of allows you
to mouse over it, and it changes color. And it's just a little bit nicer to
look at than
the alternative, if you use a print statement will print like this in just
plain text, and
it's not quite as nice. So that's what it looks like when you print it with
plain text.
Actually, you know, the, I'm not a big fan of it at all. So we'll stick to
printing data
frames, specifically like this. So that's what our data frame looks like. I'
m going
to get rid of this line of zeros because that was just to show you how they
render. And
that's what we're going to be working with moving forward. So what we're
going to do
now is I'm going to show you how to append these data points to this panda's
data frame.
So what we're going to use to do that is the append method. So final data
frame dot append.
And then what we need to put in here is a panda's series that lists all of
the entries
in the data frame. So I'll just quickly explain what a panda's series before
a panda's series
is before we move on, so if we add back this row of zeros to our panda's
data frame, like
that. So a panda's data frame is a two dimensional data structure, which
means it has rows and
columns, whereas a panda's series is a one dimensional data frame. If you've
ever worked
much in Python, it's similar to a Python list. And if you've ever worked
with NumPy, it's
similar to a NumPy array just has different methods and functions associated
with it.
But within a panda's data frame, every row and every column of that data
frame is a panda's
series. So, in order to a panda's data to the bottom of a panda's data frame
, that data
needs to be a panda's series. I'll show you what I mean now. So let's take
this row zeros
out and rerun this code cell. And then what we need to do here is create a
panda's series
with PD dot series. And then the pandas series accepts a Python list. So
lots of different
layers here. But what we're eventually going to work out to is adding a row
to this panda's
data frame. So the first thing we're going to add is the name of the stock,
the routing,
which was stored in the symbol variable earlier. And then we're going to add
our price and
market cap variables which we created earlier price market cap, and then we'
re going to
add another cell called N A. And what does this na mean? Well, this last
column, the
number of shares to buy, we actually can't calculate it until we have pulled
in all the
metrics for every stock. So what we're going to do is just start as na now,
and later on,
we'll go back in access that data and change it once we've actually pulled
in data for
all of our stocks. So what happens when we run this code cell? Oh, we missed
a comma
in there. Now we get an error can only append a series if ignore index
equals true. So this
is super super common. We're working in pandas, this ignore index equals
true, basically,
almost always needs to be added to the append method whenever you're
appending data to a
panda's data frame. So this is when we get Whoa, this isn't what we wanted.
And as you
can see, all of these columns that we created have na n values in them,
which means not
a number. And then over here, we have the data that we actually want it now
why is that?
It's because we didn't tell the append method, which columns to add that
data to. So what
we need to do to fix that is we need to add another argument here that says,
index equals
my columns. Oh, I missed a comma, I think No. Oh, so the first sorry, the
problem here
is that this index equals my column needs to be within the pandas series and
not within
the panda's data frame. So small difference there. So when we run this, we'
re missing
a comma. Again, when we run this, this actually generates what we want. And
so to give a quick
recap, we created a final data frame dot append method. And then within that
final data frame
dot append method, we created a pandas series that has all of the data
points we wanted.
And then we specified index equals my columns, which tells the append method
which columns
to add this data to. And then the last thing we did was we added to ignore
index equals
true, which is kind of necessary whenever you're appending data to a panda's
data frame.
Alright, so what we need to do now is loop through every ticker in our
stocks variable,
and execute an API call for that stock, and then store the results of that
API call in
our panda's data frame. So to start, we're going to overwrite our old panda'
s data frame
variable with an empty panda's data frame that has the same columns as
before, so columns
equals my columns, then, we need to create a for loop that says for stock
stock in stocks,
ticker. And to start, let's just print all of these print stock, what
happens? Great.
So we're successfully looping through all the stocks in that data frame, we
can get
rid of that print statement. Now what happens next, we, we need to actually
create an API
call for each stock. So to do that, we'll scroll up to our old API call cell
. And we're
going to copy both of these lines, copy and then scroll down. Alright, so
both of those
are getting copied in here, this needs to get indented. And then instead of
symbol here,
we're going to write stock. R IE x cloud API token is the same, the data is
the same. So
that's all good. Now, one thing to note about looping through our list of
stocks in this
way, is that it's going to be really, really slow. And the reason why it's
slow is because
executing an HTTP request is one of the slowest things that you can do in
Python. So this
slide here, where we actually execute the request is very, very slow. Later
on, we'll
see how to improve the performance of this code by doing batch API requests
that make
it very easy to retrieve information on multiple stocks with a single API
requests. But that's
kind of a more advanced topic. So we're going to go through with single API
requests first.
Okay, so now we need to basically recreate the append statement that we use
earlier.
So final data frame dot append. Inside the append statement, we need to
create a panda's
series of sorry, PD dot series, and inside that pandas series, we need to
create a Python
list. So lots going on here. The first thing that we're going to add is
stock. Now what
is stock stock is the incrementer of this for loop. So this will actually be
the ticker
of the stock we're working with. The next thing we need to add is data.
parsed to latest
price, so this is the same person that we did before, this will give us our
stock price,
which is the second column of the data frame. The last thing is data, and
then market cap,
which is the market capitalization just like we did earlier. And then just
like before,
we will do na for the last row within this pandas series, function, we need
to add index
equals my columns. And then outside of the pandas series as before, we need
to add, ignore
in index equals true. Now, since this is going to be so slow, like I
mentioned, let's just
do this for five stocks and see what happens. Next, oh, there's a missing
comma right there,
run. Great. Now if we print final DataFrame, what's gonna happen? You think
it'll have
information for all five stocks, and nothing's there. Why is that? Now, the
reason why that
is, is because this final data frame dot append method doesn't actually
modify the original
data frame unless you tell it to. So to do that, the easiest way is to just
say, final
data frame. I can't spell today final data frame equals equals, and then the
append statement.
So if we run this, and then run the following, so we should see a different
output. Awesome.
So this is for the first five stocks in our list of stocks. If we take off
this little
argument here, this will take a very long time to run, let's just see that
in action.
I'm going to run this print statement too. And then while this runs, I'm
going to have
a little sip of my coffee and cut out the rest of this until it's done.
Alright, so
as you can see, this is a very long panda's data frame that has all of the
information
we needed. Since it took so long, we're going to move on now to using batch
API calls to
improve the performance of this code. Now, I x cloud, like most data
providers actually
gives you discounts if you use batch API calls, because it's a lot less load
on their infrastructure.
So that's one reason to use them. The fact that it speeds up your code is
another reason
to use them. Overall, if you can use batch API calls in your scripts, it's
generally
a good practice. So pay attention to this section because it's really
important. I x
cloud as a data provider limits their batch API calls to 100 tickers. So
what we need
to do first is find some way to split our list of tickers up into sub lists
of length
100. Now, it's not super intuitive on how to do that. But basically, what I
did just
to show you is I said, how to split a list into sub lists. And in Python.
Awesome, here's
the sorry, that took a bit of time, I had to skip ahead. But here's the
function that
I was talking about. So it's called chunks. And in the finished files of
this course,
you'll see that actually gave credit to this. It has 3000 uploads, which is
pretty crazy,
even for Stack Overflow. So this is what we'll be using to split our list or
our pandas series,
in this case into chunks of size n. So we'll run that code cell to define
this function.
Okay, what we need to do now is use this chunks function to create a list of
lists where every
list is no longer than 100. So to do that, we'll say, symbol groups that
will we'll call
this equals chunks, and then the list we want to split is stocks, ticker.
Now, the reason
why we have to pass this as ticker is because this is a panda's data frame.
And ticker is
the header of the only column. So if we just did stocks, this gives us the
data frame.
If we do stocks at header ticker, that gives us a series which we can pass
into this chunks
function. And everyone say 100. So this gives us a chunk generator. And to
get the actual
lists, we just have to pass it into a list function. So we'll do that. And
it will say
symbol groups, see what that gives us. Okay, so this gives us a list of
lists, or more
specifically gives us a list of pandas series. So this is from index zero to
index 99. This
is from index 100, index 199. This is from index 200 to 299 300 to 399 400
to 49. At
the end, we have this one that's much shorter, it's 500 to 504. And that's
because there's
505 stocks in the s&p 500. And this just gives us groups of 100. At most,
and since this
one only has five minutes shorter. Okay, so moving back up, what we need to
do now is
make a for loop that lists loops through every list, or sorry, every panda's
series within
that list, executes a batch API call. And then for every stock in that list,
appends
the information from that stock to our final data frame. So first thing to
do is for i
in range, zero to length of symbol groups, and then print. So what does this
give us
this gives us 012345. Now, those are all the indices This symbol group
groups list. So
if we do this, oh, sorry, that's not what I meant to do print that. This
will just print
out all of the lists within our list of lists called symbol groups. So now
what we need
to do is we need to transform all of the stocks that's in each of those
lists into a string,
and that string will be passed into the URL of the HTTP request that we're
executing.
So first, we'll just create an empty list called symbol strings. And what
this is, is
it's going to be a list of strings, where each string is a comma separated
string of
all the stocks in this object. So all we need to do is use the append method
tabbies in
so we're going to do symbol strings, dot 10. And then what we're going to
use is the join
method called on the comma character join, and then symbol groups. Now, if
you've never
used the join method before, basically what this does is it says, Take all
of the elements
of this and join them together by separating them with this. So to see that
in action,
we'll just do print symbol, strings. Strings at index i. Alright, so here's
one example.
Here's another example. And you can tell these examples apart. Because at
the end of the
line, there's a gap. Sorry, this is one example, I had one extra line there,
you can see this
example ends there because there's a gap. And it starts on the line after
this, because
this is the last line of the first example. So there we have our five lists
of symbol
strings, I'm going to comment this out, because we don't want that printing
every time we
run this code. So the next thing we need to do is create a blank final data
frame again.
So we're using the same variable name, which probably not a good practice,
but we're overriding
it each time. So equals P dot data, frame, and columns equals my columns,
you guys know
the drill by now. Now, let's just print out to make sure that it looks okay.
Awesome,
so we can delete this out. Okay, the next thing we need to do is loop
through every
string in our symbol strings object, and then use that string to execute a
batch API request.
So we're gonna use a for loop for this. And to keep the, the nomenclature
pretty simple,
we're going to just say for symbol string, in symbol, strings. And then we
can actually
print these out to make sure that this is working right. Simple string.
Awesome, that
looks good. We're going to take away that print statement now. And then we
need to create
a batch API call URL. And as before, this is going to be an F string where
we pass in
the the base URL of the API endpoint, and then pass in all of the strings
through for
each of these symbol strings into the endpoint to get 100 to get data for
100 stocks back.
Now, if you've never run a batch API call through is called before you
probably are
unsure how to do this, let's just go to the docks here and do a Ctrl F for
batch and see
what comes up batch requests. Okay. So So what it says here is that use
market to
query multiple symbols, okay, so this is what we want this stock market. And
then, as you
can see, you pass in one of the parameters as symbols equals a comma
separated value
of symbols, and then types equals a comma separated value of API endpoints
that you
want to hit. And then if applicable range tells you how much data to pull
for. And then
at the very end of this, you would pass on your token as well. Because we're
going to
do is oops, did not mean to click that. What we're going to do is copy this,
go back to
our Jupyter Notebook. We're going to pass that in here for now. And then we'
re going
to scroll up to our last API call and get this base URL, including the
stable that I
missed earlier. We're going to copy this here. We're going to delete the
double forward slash.
And then we're going to move over to the end of the API call and change a
few things. So
the first thing we're going to do is remove this last parameter and this
range parameter.
We don't need those for the API call that we're doing. The next thing we're
going to
do is remove the chart and news endpoint from that example, because we don't
need them.
Next, we're going to actually remove these symbols that are hard coded into
this, we're
going to do a variable interpolation interpolation for a symbol string. And
then what we're going
to do is move to the very end. And as before, we can just copy on the IaaS
cloud API token
that we had in our original API call. So copy this scroll. Like down, go to
the end of the
string and add it. Awesome. Now, let's just do for symbol string and symbol
string up
to, but not including the first one. And then we're going to just print this
and see what
it gives us. Oh, and I spelt that wrong here, which I'm going to correct now
. And then print
API call URL. Now we're getting an error, that API call URL is not defined.
There's
three L's here. That's why I run that again. Alright, so this gives us a
long URL that
looks okay to me. One really easy way to test this is to just click on it.
It's a 400. Bad
request. I'm not sure why. Stable slash stock slash bash, let's make sure
that that matches,
slash stock market slash batch. Symbols equals symbol string. Types equals
one. Oh, so the
problem here is that when you're passing on multiple parameters to an API
request, so
this first character is a question mark, and then every successive character
that chains
these parameters together needs to be an ampersand. So we're going to copy
this ampersand and
replace this question mark with it. And I think that should fix the problem.
Great.
So this is actually opened in a browser, the JSON that is going to be
returned from this
API request. So now we can do what we did before, use the requests library
to get this
data. So I'm gonna do data equals requests, dot get, and then that URL. And
then as before,
we're going to pass the JSON method onto this. Actually, before we do that,
let's just do
print data dot status code. So returns 200, which means it's working
properly. Now what
we can do is pass the JSON method on to this data variable. And then we need
to parse data
from this. So more specifically, we need to parse data for every specific
stock that's
in this symbol string. Now the way to do this is just like before, how we
use this join
method to join together all of the different strings that were contained in
that list,
you can use the split method to basically do the opposite. So we can say, or
symbol
in symbol, string dot split, and we're going to split on the comma character
. Let's just
print every symbol and see what happens. Awesome. So it looks like it's
successfully looping
through all of the first 100 stocks there. And what we need to do now is
parse the batch
API call for data for every specific stock in the order that we need, and
then append
that data to the panda's data frame. So just like before, we're going to use
the append
method, we're gonna say final data frame, equals final data frame, dot
append. And then
in here, we're going to specify that it's a panda's series. And inside the
pandas series,
we're going to pass in a Python list. So all of that is exactly what we did
before. So
I kind of skipped over it. And then here, we're going to pass in the same
for variable,
so symbol, and then price, which will be data. And then we have to do
actually multiple levels
of parsing this time, because not only do we have to parse the metric for
that stock,
but we also have to parse the batch API call to get the information for that
specific stock.
So here's how you do that. We do symbol, and then quote, so this is the
stock and then
the endpoint, and then we need to parse out the metric latest price. And we
can follow
similar logic here to get the market capitalization. So market cap,
and then just like before, we are going to not specify the last column, we
will calculate
that later. Let's run this and see what happens. Actually, I'm going to add
one last line to
this set print scalar data frame. All right, let's run this and see what
happens. Type
error can only append a series. This is that ignore index specification that
I mentioned
earlier. So that needs to get added in right here. And actually now that I
think about
it, inside this pandas series class instantiation, we need to pass in index
equals my columns.
Alright, let's try this. Invalid index index and ignore index equals true. I
'm missing
a comma there. One more time. Awesome. So as you can see, we've successfully
upended
the ticker, the stock price, the market capitalization and the number of
shares to buy the only other
Change we have to make this code sell is to remove this little appendage
right here, that
made it loop only over the first instance. So now when we run this, it will
actually
loop over every single string and add every stock within our list to our
panda's data
frame. So let's try that. Awesome. As you can see, that was way, way faster
than using
individual API calls for each stock, the other one took probably two or
three minutes. And
that was only a few seconds. So huge improvement there. I mean, in theory,
if all the API requests
take the same amount of time using batch API call should make it 100 times
faster because
it does one call for every 100 stocks, but anyways, much faster in any case,
and it looks
like we go all the way from A to Z, all the data looks okay, so we're ready
to proceed
to calculating the number of shares to buy. Alright, so the reason why we
left this calculating
the number of shares to buy step until later in the script is because I want
this to actually
work regardless of how big your portfolio is. So how we're going to handle
that is,
we're going to create a Python input that says how large is your portfolio,
and then
based on what you tell Python, it will calculate the number of shares to buy
accordingly. So
to do that, we're going to use pythons input function. And more specifically
, we're gonna
say, portfolio size is equal to input. And then inside this input, it
accepts a string.
And that string is kind of what question you want, Python asked you. So here
's how it's
the looks, enter the value of your portfolio. Now if we run this, you can
see there's an
input that says enter the value of your portfolio, we can write, you know, $
1,000. And then we
can access that value later on in the script. So now, if we do, you know,
portfolio size,
it'll print out 1000. Note that this is actually a string. So that fact that
this accepts a
string is actually something we need to make sure we handle because since it
's a string,
you could say I don't have a portfolio. And that would actually kind of
break the rest
of this script, because it's going to try to do mathematical operations on
this string,
which is obviously impossible. So what we need to do is create kind of a try
accept
statement. So if you're not familiar with try accept, basically, the way
these work
is, try except, you could say, try to do this. And if this doesn't work, it
will do whatever
is specified here. So one example would be like, if you say, two plus n, and
is not defined,
so then it will print out, you know, and is not fine. We're going to comment
this out,
so it doesn't run. Now if we run this, it should print n is not defined.
Awesome. Now,
if we say two plus two, it will doesn't print it. But if we wrapped it in a
print statement
would four, okay, so we're going to use this same logic to handle the fact
that people
might not put number variables into this portfolio size input. So the first
thing to do is uncomment.
This, the next thing we need to do is to try to kind of handle our input as
a float variable.
So to do that, we're just gonna create a variable called Val. And we're
going to say that's
equal to float, portfolio size. Now, this float function is basically going
to take
this portfolio size variable and try to force it to be a float. So if we say
, print vow,
actually, no ignore that we're going to do is put that in here and say,
print vow. And
then for this, except we're going to say, Please enter an integer. Now, if
we put in
a sentence will say, again, I don't have a portfolio returns, please enter
an integer.
Now, this broad try accept statement is actually not good. And the reason
why is because this
will react regardless of what type of error is happening. So what we need to
do to make
sure that it only handles this specific type of error that we're trying to
address is we
need to try to do this and kind of force an error. So we'll put a string in
there. And
then as you can see, the type of error that's created is a value error. So
we need to basically
change this script so that it only reacts to value errors instead of
reacting to all
types of errors. And the way that you do that is you put the type of error
right here and
run this again and see that it does handle this properly. Okay, so what we
want to do
now is instead of just handling this, instead of just handling this and
saying please enter
an integer, we need to actually make it redo this. So how does this look?
Please enter
an integer. All we want to do is say in here, enter the value of your
portfolio. And then
above this, we want to tell them exactly why they're getting this input
prompt again. So
we're going to print and say, That's, please double brackets, that's not a
number, and
then four slash, and so that'll create a new line, please. Please try again.
And then now
equals float. Awesome. So just to recap, we try to get the value of someone'
s portfolio
using the input function. In the first step of the try accept block, we try
to force that
that that input ID value to be a flow variable and assign it to a variable
called Val. If
that doesn't work, because of a value error, then we print down, that's not
a number, please
try again. And then we prompt them to enter the value of their portfolio
again, and then
we, we force it to be a strongly typed float again. Now, one interesting
thing to note
about this is that if you do it twice, string, and then string, it will
still trigger a value
error. So you need to kind of rely that the user of this script is going to
be like, I
guess, astute enough to not enter a string twice. So it might look something
like this,
they go and say, enter the value of your portfolio $1,000. And they say,
That's not a number,
please try again. So then you say, Oh, it's $1,000. And then you can proceed
through the
rest of the script. Okay, so now that we've accepted the number from the
user of this
script, we need to actually use that number to base our share calculations
on now since
this is equal weight, every stock in the portfolio will have the same
position size and position
size is kind of just a fancy financial term that says how much money you're
going to invest
in each stock now, Center's 505. Stocks, an easy example would be if your
portfolio is
$505,000, all you would do is invest $1,000 in each stock. So as you can see
, the easiest
way to calculate that is just to say, position size, is equal to the value
divided by the
length of any column from the final data frame. So in this case, we'll just
do final data
frame dot index, and then print position size. This says, We made a small
portfolio here
of $1,000. So it's just you know, saying give us $1.98 in each stock, let's
use something
more reasonable, like a million dollars. Awesome. So now saying invest, you
know, $19,000 in
each stock, that sounds about right to me, Oh, hang on, is this a million or
123, that's
10 million. So $19,000 in each stock, or almost 20,000 makes sense for a $10
million portfolio.
Alright, so we have the position size now. Now we need to calculate how many
shares of
each stock need to be purchased to get to that position size in that stock.
So let's
take an easy example. Apple stock price is about $500. Right now. So if we
wanted, you
know, number of Apple shares to buy, you would just say, position, size
divided by Apple
stock price, which is $500. ish. And then you can print the number of Apple
shares.
And in this case, it says, okay, buy 39.6 Apple shares. Now, this introduces
an interesting
concept in that a lot of places do not support fractional trading, which
means that you can
buy a portion of a share, you can only buy whole shares. So you can buy 39
shares, or
you can buy 40 shares, but you cannot buy 39.6 shares. So how do we handle
this? Well,
it's tempting to just say, let's round it to the closest number up or down.
But what
might happen then is that if you are rounding up more integers than you're
rounding down,
you'll actually end up buying more stock than you have money for. And that
is not a good
thing, you would get to the end of your allocation and say, oh, man, I don't
actually have enough
money left over to get to my position size target on this stock. So because
of that,
we actually have to round each of these down. And that's where that math
module that we
imported earlier comes in. So math dot floor is pythons round down function.
So if we print
this, that should give us 39. Oh, shoot, math is not defined. I think it
might be lowercase
n. Yeah, math floor. So that gives us 39. Okay, so this is kind of like the
logical
intuition on how to do this, we now need to programmatically loop through
every row of
our panda's data frame, and actually apply this logic to the number of
shares by column.
Now to do this, I'm just going to comment that out because it'll be useful
for us later.
Alright, so we're going to say for i in range from zero, and then we needed
to go to the
last row of the data frame, and the easiest way to do that is just say the
length of any
row within the data frame. What I typically use is final data frame index.
As you can
see, this logic is actually basically what I used here as well. So Let's use
this and
then just print everything in here. All right, this goes from goes from zero
to 504. Let's
just make sure that adds up with the data frame we clicked earlier or
printed earlier,
I should say, it goes from zero to 504. Awesome. Now, we need to access this
cell in each row
of the data frame. The easiest way to do that is using the LLC method. So
what we're gonna
do is, say final data frame, dot LLC. And this is a basically like an easy
row column
way of accessing data in pandas. So for the row, we're going to say I and
for the column,
we're going to say, the number of shares to buy. And let's just print this
to see if it
prints a bunch of Ma, because that's what it should print. Awesome. So we're
correctly
accessing the data. Now we need to actually assign the data. So to do that,
you can just
use the equality operator and then say, math dot floor. And then we need to
do basically
this calculation here, position size divided by the stock price. So we'll
say, position,
size divided by now what are we going to do for stock price? Do we have to
execute another
API call and pull that data from IE x again, we actually don't and the
reason why is because
that data is already stored in the panda's data frame. So we can take
similar logic to
this LFC method here and put that there, paste it, and just change this to
stock. Price.
Oh, this tab, autocomplete is not working. So if we run this, what happens?
It ran correctly.
Let's do this again, and print the data frame to see if it looks okay.
Awesome. So as you
can see, it correctly kind of shows the number of shares to buy here. One
easy way to fact
check this, well, there's a couple things you knew, the first thing is to
say, Okay,
this apple stock price is close to the 39 that we calculated earlier. So
that's a good
sign. Another thing you can do is say, look at two stocks that have
different stock prices,
and the one that has the higher stock price should be recommended to buy
less shares than
one that has a lower stock price. So let's just use the first two rows of
this data frame
as an example, you can see that for this stock with ticker a, it costs $100
to buy it and
you're buying 197. And then the second stock has a much lower stock price.
And because
of that, you have to buy many more shares to actually reach your target
position size.
Now, that kind of all looks good. So we can move on, I'm going to take away
this comment
code and just tidy this up a little bit. And we can proceed to the next cell
. All right.
So this brings us to the last section of this project, which is to save our
panda's data
frame into an XLS. x file for non technical users to access to excel. So the
big idea
here is that we did all the work in a Python script, you could schedule the
script to run
periodically, and then the output would be sent to your trading team to
actually buy
and sell stocks in response to the output. So to do this, we're going to use
the XLS
x writer library for Python. Now, oops, the new mean remnant. If you
remember, at the
start of this project, many lines of code ago, we actually imported this
here. So it's
actually in our script already, we just need to use it now to save our data
frame as an
Excel file. Now XLS x writer is an excellent package and offers an insane
amount of customization.
But the trade off for this is that the library can seem a bit complex to new
users, I'm going
to do my best to explain it well throughout this section. And because of
that, this might
be a bit long. So if this seems easy, or you've used that asset, if this
seems easy, or you've
used XLS x writer in the past, you can feel free to speed up the speed here
a bit or skip
ahead. Alright, so the first thing we need to do is actually initialize our
XLS x writer
object. And the way that we do that is by saying writer equals PD dot XML
writer. And
then here, we need to pass in the file name that we want to save to in this
case, we're
going to do recommended trades dot XLS x. And then we need to specify the
engine, which
is XLS x writer. Now, let's break this down a little bit to kind of describe
what's going
on here. As you can see, the class that we're initializing here is from the
pandas library,
and not actually from XLS x writer. And the reason why that is, is because
since pandas
deals with tabular data, and so much tabular data is actually saved to Excel
files. They
have a very tightly coupled integration. And it actually kind of is easiest
to initialize
a new writer object from pandas rather than XLS x writer and kind of by
default, you always
save these objects to a variable called writer. It allows you to reuse your
code without having
to change the object name later and it's kind of just a best practice. So
when I say writer
equals P dot x, x l writer, and then this is the filename that we want to
save to and
then the engine equals XLS x writer. It may seem a bit redundant, but this
library can
also be used to Save XML file. So we have to actually specify that we want
to work in
Excel. So that's the first thing we need to do. The next thing we need to do
is actually
to pass our panda's data frame into this object in specify which tab of the
Excel file we
want it to be saved to. So to do that, we do final data frame.to Excel. And
then the
first thing we pass in is that writer object. The next thing we pass in is
the name of the
tab we want it to be saved to so we'll say recommended trade again. And the
last thing
we pass in is an index equals false argument. Alright, so our object has
been created. The
next thing we need to do is to create the format's we need for our XLS x
file. Now,
earlier when I said that this library can be confusing or complicated to new
users,
this is where it gets really, really complicated. Formatting Excel files
with XLS x writer is
a science and there's a lot to it. But I'm going to try to keep things
simple here. And
hopefully, you guys won't run into any hiccups. First things we're going to
do is create,
I guess, two variables that specify the color scheme for our Excel sheet.
And then we will
reference those variables in all the formats that we create later on. So
that if we want
to change one color, it will be reflected across all the different formats
that we create
later, instead of hard coding them in and having to change many different
instances
have a specific color. So specifically, what we're going to create is a
background color
variable and a font color variable. So background color, and these are just
going to be empty
strings for now. font color. Alright, so what goes in here, these actually
store the HTML
hex codes for the very first for the colors that we want to select. So I'm
going to use
two colors that kind of match the style of the Free Code Camp website, I
really like
it, it's command line chic, and very modern. So we're just going to follow
that to make
things easy, the background color is going to be zero, a zero, a two, three,
and the
font color is going to be at six s. Alright, so that's that. Now we need to
actually create
a few different formats that we will be referencing when we actually apply
the format's to our
cells in our Excel sheet later on. So as you can see up here, we need a
string format $1
format with decimals, $1 format without decimals and an integer format. So
let's start with
string format. And then what we're going to do here is create a writer dot
buck dot add
format method. Inside this add format method, we have to create a dictionary
. And this dictionary
specifies the format of the cells that are going to have this applied to
them later.
So what goes in here, there's three main attributes on color, background,
color, and border. Now
each of these should actually be strings, I'm going to fix that indenting
issue. Since
this is a dictionary, what is going on, I find the indenting on Jupyter
Notebooks leaves
a lot to be desired. Since this is a dictionary, these should be strings.
And we have to add
colons to each one. All right, so the font color is going to be font color.
And this
font color variable is what we defined up here, this background color is
going to be
background color. And this border is going to be one, this just means that a
solid border
around each one now. This is our first one, let's run this code to make sure
it works
properly. Awesome. Now we need to basically template off of this and create
a few other
formats. So we have string format, and we need to name the other ones that
we described
before this code cell is a string format, dollar format, and in integer
format, now,
all of these are going to share the same attributes that the string format
specified, we just
need to add a few things to each one. So for dollar format, we need to
specify a number
format attribute like this. And the way that this works is actually pretty
easy. You just
pass in a number with zeros in it templated the way that you want it to be
formatted.
So in this case, it will look like that. And then we will do the same thing
for the integer
format. So num format and then we're just going to say zero. Awesome. Let's
run this
code cell to make sure there's no syntax errors. Excellent. Alright, so now
we need to apply
the format that we just created to our XLS x file. And this is like a big
complicated
so what I did here was I actually provided an example and we can kind of use
this to
template on for all of the columns in our sheet. So what we need to do is
say writer
dot sheets, and then parse out the recommended trade sheet from an object.
And then we use
the set column method on that item to success. To find information about the
column we're
doing now, we're gonna do this one by one first, and then we're going to
kind of put
everything into a loop because it'll be a bit more efficient on code and
much more readable.
So let's start with column A. And you actually can't just put a, you have to
put a through
a like that. So that tells you the column. And then we need to specify the
column width,
let's do 18 pixels. And then you need to say which format you want to apply
to it. Column
A is our ticker format. So we need string format, it's not a number. Now, if
you run
that, it returns zero. And that basically tells you that it ran properly,
the way to
actually see this in action is you need to actually save your writer object.
Now, to
do that, you say writer dot save. And then if you go to your working
directory, where
you're running through this, you should be able to open up your recommended
trades. Excel
file and see that the first column has actually been formatted properly. So
let's do that
for all the other columns. And then we can change a couple things to make it
formatted
better. So how to do this, we'll just copy this a few times and change the
letter four
times to be specific. So this should be B for me, this should be C for C,
and this should
be d3, D. Now let's run this and see that what happens to the Excel sheet.
Only one
was formatted. Why is that? These? Oh, I think we have to go back here and
re initialize
this. Alright, let's try now. Awesome. So as you can see, the columns are
being formatted
properly, but the headers aren't. Just so you know, we're gonna solve that
later. It's
not something that I'm ignoring. But we will solve that in a few cells. So
what we need
to do now is we're going to simplify this by making two loops. And each of
those loops
is going to work with this column format, a dictionary that we're going to
create. So
let's create column format, and make a dictionary. Now what this dictionary
is going to be is
it's going to have a key for each column. So A, B, C, and D. and the value
for each
column is going to be a list inside that list, the first item is going to be
the title. And
the second item is going to be which format we want to apply. format. It's a
ticker string
format, and then stock price, dollar format, comma, their market cap. And
the last one
is going to be number of shares to buy and integer format. So let's run that
sudden,
that cell ran correctly, which tells you that that dictionary is formatted
properly. And
now we're ready to start building a couple of loops to do all of the work
that we did
up here, but automatically, so the loop that we're going to be building
specifically is
going to loop through every key in that dictionary. So when you say for
column in column, format
stuff key, if you've never used the, the keys method to iterate through a
dictionary before,
it basically just returns all of these. So we print the call and it should
print ABCD.
Yep. Now what we need to do is create a writer dot sheets, recommended
trades dot set column,
we're basically doing all the work we did up here. So set column. Here, we'
re going
to do an F string that interpolates in column, column. So this would just be
like for column
A would be a colon a, we're going to Ms string. And then we're gonna do 18
pixels wide. And
then we're going to do column format, add column. So this will be this
dictionary at
key A. So this would return this list. And then we need the second element
of this list.
And since Python is zero index, three parts one. Great, so that runs
properly. Awesome.
So if you if we redo all of this to Okay, here's we're going to do, we're
going to comment
out this. We're going to put a writer dot save method there. And then we're
going to
go back to the start reinitialize, our writer object and run all this code.
Awesome. Let's
take a look and see what our Excel file looks like. Great, so everything's
being formatted
properly. In that loop. Our code is a lot cleaner, it's easier to read. It's
a big improvement.
Now what we need to do is handle these and columns in XLS x right are kind
of hard to
deal with, what I always do is I just overwrite them. So writer dot sheets,
recommended trades.
And then I do dot, right. And this right method you pass in, what
information you want stored
in the cell, and what format you want to apply to that cell. You also need
to specify the
location of the cell. So here's what we're gonna do renew a one, we're gonna
say ticker.
And we're gonna say string format. Now, if we go back to the top again,
reinitialize
this and run through the rest of this, and go back to our cell file. The
first cell is
formatted properly. So now we just need to do the same thing for stock price
market capitalization
and number of shares to buy. Let's go back, easiest way to handle that is to
just copy
and paste. So this will change to a one B one note sorry, that should be a
one or sorry,
a two or B one. B one, Okay, perfect. All right, a one, B one, c one, D one.
And this
will be stock price. With dollar format. This will be market capital
realization with dollar
format. And this will be number shares to buy with integer format. Alright,
as before,
we're going to go back to the top, we're going to reinitialize, this Excel
writer object,
run our code to the very end, and then take another look at our Excel file.
Awesome. So
everything looks formatted properly, the only thing that might be worth
improving is to
widen this a little bit. But I mean, anyone who has an auto format or for
Excel will be
able to handle that pretty easily. Alright, so this violates the programming
principle
of do not repeat yourself, this is very, very loopable. So we're going to
create another
loop here, that basically does this logic. So Alright, so we're going to add
this in
here. And what we need to do is, say, column here, and convert this to an F
string. So
as this loopster through this will go from a one to B one to C one to D one.
And then
we need to capture this value here. So it's going to be column, column,
format. So this
is the dictionary. And then we parse out columns. So this is the value of a
key value pair,
and then we want zero, and this will be the first entry of the list, that is
the value
in the key value pair. So that's this, and then this needs to get replaced
with the second
entry of the list. Easiest way to do that is to just copy this and replace
the zero
with a one. So let's go there, one. Alright, as before, we're going to
scroll all the way
up to the top reinitialize, our writer object, move all the way through to
the end. Let's
open this up and see if everything works. Okay, great. So this is the final
output of
the first project from this algorithmic trading and Python course. It says
here that the last
step is to save the writer output that's actually this line. So we're
already done. And just
like that, you built your first algorithmic trading project in Python, I
hope you had
fun, the next few projects are gonna be even more often, you're going to
learn a lot more.
So I hope you enjoy the rest of this course. And kudos to you for sticking
through it so
far. All right. So welcome to the second project for this course, where we'
re going to build
a quantitative momentum strategy, what we need to do to start just like
before is we
need to open up our Jupyter Notebook. So here, I'm here in my terminal in my
home root directory,
and what I need to do is navigate into my dev folder. And within that dev
folder, I
need to navigate into algorithmic trading in Python, which you can see here
auto completed
for me, so we'll move into there. And now that we're here, we need to
activate our virtual
environment. So you'll remember from before that the command to do that is
source and
then V and V four slash bin forward slash activate. If you look to the left
and your
terminal command, here, you can see that that V and V in parentheses there
shows that the
virtual environment has been activated correctly, and none of the virtual
environment has been
activated. You can watch it with Python bash, and a notebook that will
launch a Jupyter
Notebook. Now, one thing that I'm not sure I mentioned before is that you
can deactivate
a virtual environment by simply typing deactivate. And if you try to launch
a Jupyter Notebook
with this, it will actually fail because it'll say no module named notebook.
So we need to
reactivate our virtual environment and then launch our Jupyter Notebook.
This will start
the server in our browser and we can get started. Just like before, we're
going to want to navigate
into starter files and then open up the second project this time, which is
quantitative momentum
strategy. So let's open that To get started. Alright, so just like before,
this Jupyter
Notebook starts with a bit of background information. So it says momentum
investing means investing
in the stocks that have increased in price the most. For this project, we're
going to
build an investing strategy that selects the 50 stocks with the highest
price momentum.
From there, we will calculate recommended trades for an equal weight
portfolio of the
stock. So, this project will combine a lot of the stuff that we used in the
last section.
So that repetition will help you remember what we actually just did, because
I know
that that last project actually contain a lot of stuff. And then we're going
to build
on top of that by not just equal weighting a already selected universe of
stocks being
the s&p 500. But we're going to actually select a subset of those stocks
based on their momentum
characteristics. So let's get started by first importing some libraries.
Alright, so just
like before, we're gonna need NumPy, and we're gonna import that as NP, we'
re gonna need
pandas, and we're gonna import that as PD. So as a quick recap, NumPy is the
best Python
library for numerical computing and pandas is a Data Science Library that
makes it very
easy to work with tabular data in Python, you'll remember that we worked a
bunch of
the PD dot data frame structure in the last project. And we'll do the same
thing in this
project. We're also going to need the requests library for making HTTP
requests, we're going
to need the math library for performing some basic math functions. And we're
going to need
the Sai pi library. Actually, we're not going to need the whole library, we'
re just going
to need the stats library from Sai pi imports that. And the last thing that
we need is the
XLS. x writer library, which we saw in the last section allows us to easily
format and
save Excel files from a Python script. Now, we've seen all of these before
except for
the sci fi stats module. And what this does is it makes it very easy to
calculate percentile
scores. So we'll see later on in this project that what we're actually going
to do is we're
going to gather and parse momentum momentum metrics for all of the stocks in
our universe.
And then we're going to calculate percentile scores for those momentum
metrics, then we're
going to rank the stocks on those percentile scores, and we're going to
select the 50 stocks
that have the highest average percentile score across a broad basket of
momentum metrics.
So if you haven't seen side by stats, it's pretty easy. We're just gonna use
the percentile
score calculation feature of it. And yeah, so that's it for our libraries,
let's move
on to importing our list of stocks. So actually, we should run this code
sale first. Alright,
so now that our imports have been done for our open source Python libraries,
we need
to actually import our list of stocks. Now. We also need to import our API
token, we did
both of these steps in the last project as well. So before I actually code
this out,
try to go through and without looking at your last one, figure out how to
import your list
of stocks into a panda's series and figure out how to import your API token
from your
secrets file. So cut here and try to do that before proceeding. Okay, I hope
you had success
doing that if you didn't, the command to import your list of SOS PT,
underscore, sorry, dot
read underscore CSV, and then type in your list of stocks. Now, it should
autocomplete
if you just type in SP. And then you can print it out to make sure it looks
okay. Just like
before, it has all 505 stocks in the s&p 500. And then for the secrets file,
we went from
secrets import. And if you hit tab, you can see that the is how the API
token actually
shows up here. So that is great. We can run this code cell to get both of
those imported,
and we can move on to make our first API call. Now, this is where this
project will start
to be quite a bit different than the last project that we worked through.
Because we're
actually going to be pulling different stock market information from the IE
x cloud API.
More specifically, instead of trying to pull price and market capitalization
, we're going
to be pulling price in one year stock return. So to do that, we're going to
go to the i
x cloud docs. Oops, let's try and keep this all in one browser window. There
we go. So
I x docs. And we're gonna go here, it's usually the first non Add Entry in a
Google search.
And then we're going to run a Ctrl F and see what kind of return metrics we
can actually
find in here. Okay, so return is clearly not a good thing to search for,
because it gives
341 heads and you can see why that is, is because a lot of these will say
most endpoints
return. So the fact that endpoints can return something in stocks can also
return a percentage
is not good. What if we just type it in momentum? These are all more complex
momentum metrics
than we need. Okay, what if we type in price return? Zero if we type in
price return with
no space. Okay, once we type in year return, this might be REST API endpoint
that gives
us a one year return or something like that. Still nothing. What if we do
performance?
Okay, this gives us 21. sector performance. As you can see, it's not always
super straightforward
on how to find these. Okay. So now I'm searching through return again, it is
a lot of hits.
But I feel like it's going to be near the top of this documentation page.
Okay, so this
looks promising. This is a historical prices endpoint. And this would
actually allow you
to get the job and you could pull in historical prices for let's say, one
year, and use the
today's price divided by the price one year ago, and then subtract one and
that will give
you your one year price return. However, I know that I x actually calculates
this metric
and I separate endpoints. So we're going to keep moving through this until
we find it.
Alright, so I had to cut their terms of what we're actually looking for is
the word change,
not the word return. And I do a Ctrl F, it's the first thing we find, of
course, so this
is the key stats endpoint, which you can get with the stats keyword in the
URL. And if
you look down here, there's actually an awesome group of momentum metrics.
So you can do maximum
percent change five year percent change to year percent change one year,
year to date,
six months, three months, in case you can't see us all zoom in a little bit.
I guess you
can't zoom I There we go. Woof. That's a lot of zoom. There we go. There we
go. Alright,
so max change five year, two year, one year here today, six months, three
months, one
month, three days, sorry, 30 days and five days. So this is the stats
endpoint we're
going to do is we're going to copy this. And we're just going to put it in
our Jupyter
Notebook for now to come back to later. And we also need the sandbox, base
API endpoint.
zoom back to normal zoom, because it's a little easier for me to navigate.
So here's the testing
sandbox, we're going to find that this is the base URL, we're going to put
this in here,
right here. And then remember from last time, we need to actually include
stable here. So
this is a example of the base URL that we need to hit with our API call. We'
re going
to call this string API underscore URL. And then we actually need to specify
a symbol.
So just like before, we're going to use Apple as an example symbol. And we'
re going to use
an F string to interpolate that symbol variable into this, we can take this
right off the
end, because we're not actually trying to parse a specific stat. And then we
can use
the requests library to make an easy API call to get this data. So to do
that, we're going
to say data is equal to request dot get, and then we're going to do API
underscore URL.
And then we're going to sprint the status code of that HTTP request to make
sure that
went through properly. So this gets 400. We did something wrong. What is it?
sandbox that
is, oh, we didn't pass on our i x cloud API token. So we need to do question
mark token
equals IE x and then tab, it should autocomplete and try that. Alright, so
returns 200, which
means the API call is working properly. Let's transform this into a JSON and
put this again.
Alright, so as you can see this content. As you can see, this contains all
the information
that it should contain. The information that we specifically want is the one
year price
returns for every stock in the universe. So to do that, it's actually quite
easy. We're
going to cover that next. So to parse a JSON object like this, we saw in the
last section
that you just pass in square brackets, and then the key of the dictionary.
So the key
that we want now is one year change percent. And all these keys generally
don't start with
integers. They start with letters so that's why it says year five instead of
five year
and etc. But this is the We want, so we're going to copy this, we're going
to move down
here, we're going to add that in. And then this should give us 1.42. So what
does that
mean? That means that Apple has increased in price 142% in the last year,
which is pretty
impressive, it means that the stock has more than doubled. But I guess that'
s what happens
when you sell a lot of expensive iPhones. So just like before, we're going
to kind of
loop through all of the stocks in our universe and provide API calls for all
of them. Instead
of living through them one by one, like we saw in the last section, which is
extremely
slow, we're going to move right on to executing batch API calls, because
that's kind of a
better practice. And there's no point practicing executing single API calls
when you'll generally
never do that in practice. So the first thing we need to do is chunk our
list of stocks
into groups of 100, using the chunks function that we created last time. So
to make this
easy, I've actually included a bunch of reusable code from the last project
here. So just run
this code cell that will give us the chunks function. And then we use the
chunks function
to divide our list of stocks into symbol groups of length 100, then we
create an empty list
called symbol strings. And then for every 100 stocks in this symbol groups
list, we
create a comma separated list of strings and then add them to this simple
strings variable.
So if you go through every symbol string in that list, we did this last
project as well.
But we're gonna do it again, just for practice. And then you print it, you'
ll see that this
is basically just five lists. And I'm sorry, there should be four in Yeah,
so this is just
five lists of length 100. So when I say lists, they're not Python lists,
they're actually
strings separated by commas, but there's five of them. So here's one, here's
another. Here's
where the third one and fourth one, the fifth one, and I said there's five,
there's actually
six sorry, that's because we have this short one of length five at the end.
Okay, so that's
all of this, I'm going to delete this out. And we are going to move on to
creating a
blank panda's data frame. So just like before, we're going to call this
final data frame.
And we're going to instantiate it by saying PT, dot data, frame, and then
columns equals,
and then you'll see up in this code, so I've actually defined the names of
the columns
we want here. So it's ticker price, when your price return and number of
shares to buy.
These are basically the same column names as we used in the last project,
except market
capitalization has been replaced by one year price return here. So we can go
down here
and specify those columns by saying columns equals my underscore columns.
And then, if
we print this, we should see an empty panda's data frame with the columns we
specify. Awesome,
so that's good. We're going to delete that. And then we need to loop through
all of our
symbol strings in the symbol string, symbol strings object and create a
batch API call.
So for symbol, string and symbol, strings, and then we want to create a
batch API call
URL, we're going to leave that as an empty string for a moment. And then we
want to create
a data variable. And that's going to be a requests dot get method that
accepts batch
API call URL, and then calls the JSON method on it to transform it from a
request subject
to a JSON object. So what goes in here? Let's go back to the iCloud Doc's to
look at fact
API call syntax again. Alright, so I just did a Ctrl, F for batch here to
find this
section, which tells you everything you need to know about batch requests,
so stock, and
then symbol, and then batch. Actually, we want to query. So this syntax here
is for
querying multiple endpoints for one stop, but we want to query one endpoint
for multiple
socks. So we actually want to use this syntax here. So I'm going to copy
this link address.
And I think this just gives us the last bit. Now it gives us the entire
thing, we're going
to change this to sandbox, because by default, the docs give you a a live
URL, not a sandbox
URL. And then we're gonna want to move down here and change a few things
here. So what
endpoint Do we need first, actually, to start, move this up here so you can
actually see
it. To start, let's take off this last parameter and the range parameter. So
those can both
be gone. And we're going to have to add back our token equals, and then is
cloud. Oh, sorry,
this isn't an F string. So that won't autocomplete i x cloud. Yep, I x cloud
API token, then
we're going to want to change the endpoints next. So what goes there? Let's
scroll back
up to see which endpoint we actually queried earlier. Alright, so we use the
stats endpoint.
So let's put that down into our batch API call. stats. Then here, we want to
change
our symbols to be an interpolated variable that matches whatever the
incrementer is of
our for loop. So that is symbol string. Alright, so now what we need to do
is, we're going
to run over this loop once, just to make sure that we're getting a 200 HTTP
code to make
sure that the HTTP request is being executed properly. So to do that, we're
going to print
data dot status code. And for this to actually work properly, we're gonna
have to remove
this JSON method temporarily from the HTTP request. And then here, we're
specifying what
we loop over, we're gonna want to change this to a colon one, which tells us
to loop only
over the first entry of that string. So if this is working properly, we will
get a response
of 200. Awesome, so that's good, we can get rid of this status code
attribute from the
data object and call this JSON method on it to see what the structure of the
HTTP response
actually looks like. So this is basically a huge dictionary, it has many
levels. So
at the top level, there is a key for this ticker, and a value of a
dictionary. And then
within that dictionary, there is a key for the endpoint stats, and a
dictionary for the
value. And then within that dictionary, there's a key for every metric with
a value that corresponds
to that metric. So we have to do multiple levels of parsing here. But before
we do that,
we actually have to loop over all of the stocks. So let's let's build that
loop. First off
we're going to do is we're going to say for symbol in symbol, string, dot
split, and we're
going to split on the colon character, which separates all of the tickers in
the symbol
string. So let's see what that split method actually returns before we go
any further,
Copy that, comment that out and then print that oops, print that print, this
is what
I want to print. There we go. Awesome. So let's run that and see what it
gives us. As
you can see, this is a Python list where each item in the list is a symbol
that corresponds
to all the comma separated symbols in the symbol string. Okay, perfect. So
we'll get
rid of this did not mean open my dev tools. We will uncomment this. And then
we will start
parsing the data in this loop. So we're going to say. So what we want to do
here is we want
to, for every ticker in this list, we want to append the relevant metrics to
the final
data frame. So to do that, we're going to say final data frame is equal to
final data
frame dot append. And then within this append method, there's a few things
we need, we need
a pandas series. And that pandas series will accept a Python list. And it
will also accept
the index equals my underscore columns argument that, as you'll recall, from
the first project,
this tells the append method were to actually put the new metrics in the
existing panda's
data frame. And then outside the panda series, we're going to actually add
ignore index equals
true. Okay, perfect. So this is where we want to do our parsing. The first
metric we want
is symbol. The second metric we want is price. So to get price, we're going
to need to do
data, and then parse the symbol, and then parse the stats endpoint. And then
we're going
to have to actually see what the name of the price metric is within this. So
let's write
here. Let's just print out one example of this. Oh, sorry. All right there,
right here.
Let's crank data. And then we'll use Apple as an example ticker. And then we
'll use stats.
And then let's just see what this gives us. All right, there's giving us an
error, because
we didn't actually format the rest of it properly. But let's look and see
here what we have for
price. I'm going to do Ctrl, F for price, and see if anything highlights up
here. market
cap employees. Doesn't look like this actually has any price data. So we
need to add here
to our endpoints, we're gonna say price, and stats, let's go back gives us
if we parse
price, I think this just gives us 504 hours. Okay, perfect. So this is what
we need to
do for price. We're going to replace Apple with a symbol that we're looping
over. And
then we're also going to pass in one year price return. So we can use this
print statement
up here to see what we need. We're getting an error here because we're
trying to open
To values to a data frame that has four values. So for now, I'm just going
to put in NA placeholders
for both of them. And actually, for the number of shares to buy, that na
placeholder is actually
going to stick. So we only need to figure out the one year price return. So
we'll run
this to get rid of that error. And then look in here to see what we need to
parse out for
the one year price return. All right. Year One is what we need to search for
, okay, here's
the metric year one change percent. So we'll copy that, we will paste that
there for now.
And then what we want to do is copy this, paste it there, and then add this
in as a
additional level of parsing to that. Alright, let's run this, I'm going to
take this off.
And this means that instead of looping over just the first symbol string, we
're going
to loop over all of the symbol strings. And then inside of this big loop, I'
m going to
print the final data frame so you can see what it looks like. With any luck,
this should
be okay. I'm sorry, I'm going to do this again and remove this print
statement. Alright.
Float object is not sub scriptable. Huh, that's because we need to change
this endpoint to
stats price. Alright, so this will take a sec to run. It's a pretty big data
frame to
build. Awesome, everything looks good. So just to recap, what we did here
was, we created
an empty panda's data frame with the columns that are specified here. And
then we looped
over all of the symbol strings in our symbol strings object, we created a
batch API call
URL for those symbol strings that hits two different endpoints, it hits the
price endpoint
and it hits the stats endpoint. And then we use the requests library to
execute an HTTP
request and get that data in the form of a JSON object. For every string in
that comma
separated string of string, or sorry, for every symbol in that comma
separated string
of symbols. We split them using the split method to get a list of them. And
then we
looped over that list to a pend the data for each symbol to our empty panda'
s data frame.
And then we printed the panda's data frame. And it looks like this.
Everything seems to
be working so far, so we can move on to removing low momentum stocks from
our panda's data
frame. Alright, so like the other code cells in this project, this has a bit
of background
information, it says the investment strategy that we're building seeks to
identify the
50 highest momentum stocks in the s&p 500. Because of this, the next thing
we need to
do is remove all the stocks in our data frame that fall below this momentum
threshold. To
do that, we'll sort the panda's data frame by one year price return and then
drop all
stocks outside the top 50. That sounds easy enough, and pandas actually
makes it very
easy to sort data within the data frame. So we're going to use some built in
functionality
of the pandas library to do this. Alright, so the first thing we're gonna do
is call
final data frame. And then on this final data frame, we're going to call the
pandas method
that allows us to sort the data frame based on values contained as columns.
And that method
is sorted values. Now, this takes a few different parameters. The first one
is the column that
you want to use to sort so we're going to use one year price return, if you
just type
in one and then tab, it might autocomplete. Now if it doesn't autocomplete,
you can just
go up here and copy it, copying is always a good idea, because there's lots
of different
ways you could spell this and just kind of avoid any typos, although typos
are pretty
easy bug to fix. Alright, so one year price return. And then the next
parameter it accepts
is whether or not you want it to sort ascending or sort by descending. Now,
we want to sort
descending values so that the highest momentum stocks are at the top. So to
do that, we'll
say ascending equals false. And then the last thing we want to say is in
place equals true.
Now, if you've worked with pandas lots in the past, you're definitely
familiar with
this in place equals true method. What this does is it changes the function
from just
returning a sorted data frame to actually modifying the original data frame.
Now, to
see this in action, you can take it away, run this, and it shows the highest
one year
price return stocks at the top. But then, if you run that line, and then
print the original
data frame right below it, it will not return the sort of data frame, it
will just return
the original one. So this shows you that without the in place equals true
method that doesn't
actually modify the original date. Right now, if you add the in place equals
true method,
that's not true, it will return the sorted data frame. So that's how we
start the values.
Now what we need to do is modify the panda's data frame so that it only
includes the first
50 rows. Now to do that, it's actually quite easy. You can just pass in the
dash 50 Yeah,
dash 50 or sorry, not dash colon 50 specification into square brackets like
this. Now, the indices
are all different now because it's been sorted. So I'm actually going to
just make sure that
this is right by doing length. Yeah, so it's 50. Great. So this has 50
stocks now. And
then to make this so that it actually modifies the original pandas data
frame, all you need
to do is say file dot train equals finals that frame equals 50. Now if you
print it,
it will just return a data frame with 50 rows. Awesome. So awesome. So one
last thing that
we can do is to reset the index. So it doesn't start and have kind of this
random list of
numbers. So the easiest way to do that is just with pandas reset underscore
index method
doesn't actually accept any parameters. But as you can see, it does the job.
Now, just
like before. Without the in place equals true method, this doesn't actually
modify the original
data frame, so we'll have to specify that in here. You'll notice as you work
more and
more with the pandas library that this in place equals true parameter works
in almost
all of the functions that you use regularly in the library. So there we go.
So just to
recap what we did here, we took our final data frame object, which contains
price and
momentum data on all the stocks in the s&p 500. And we sorted its rows based
on their
one year price returns such that the highest momentum stocks are at the top,
and then we
use the in place equals true parameter to actually modify the original data
frame instead
of just returning a temporary copy. Once that was done in the next line, we
modified the
data frames such that it only contains the 50 stocks with the highest price
momentum.
And then we use pandas reset underscore index method to change the index of
the data frame
so that it actually runs from zero to 49, or zero to 50, I should say, zero
to 49. Yeah,
zero 49. And then we just printed out the final data frame on the last line.
So with
that out of the way, we can move on to actually calculating the number of
shares to buy for
this simple momentum strategy. Alright, so just like in the last project,
this is going
to be kind of similar, the only change we're going to make is that we're
going to wrap
the functionality of accepting the port portfolio size inside of a function
because we'll actually
be using the same functionality later in this tutorial, when we build a
better and more
robust strategy, you can actually see that heading here, we started in the
next code
sale. So to do that, we're gonna have to wrap it in a function and it's
probably not necessary,
wrap it in a function, you could just copy paste code, it is a Jupyter
Notebook after
all, but it is definitely better to create functions if you're ever going to
reuse any
code. And since we're not building too many functions in this Python course,
it's kind
of a good excuse to practice building Python functions. So if you're
familiar with Python
functions, and since we've already done this same functionality in the last
project, try
to pause this video here and try to complete the function yourself without
watching me
code through it. So break now and try that yourself. Alright, so we're going
to do is
we're going to say def portfolio underscore size. So if you're not familiar
with function
syntax, in Python, this is just defining a new function called portfolio
size. And the
empty brackets here means that it's accepts no parameters. And then the
first thing we're
gonna do is we need to create a global variable in here. So this function
can define a variable,
and then it can be accessed outside of that function. So if you were to do
that as global,
and we're going to call this variable at portfolio underscore size, and then
what we're gonna
do is we're gonna say portfolio. portfolio, underscore size equal to input.
And then just
like before, we're gonna say, enter this size of your portfolio. Now, as we
saw in the last
project, we need to make some special functionality here such that if
someone tries to enter a
string into this Python input, it will tell them to re enter a float value.
So we're going
to do inverse A try float, portfolio, underscore size. And if this returns
an error, we're
going to say, print. That is not a number. And then we're going to print
again and say,
Please try it again. And then we are going to say, basically, this exact
same functionality.
Great. And then the only other change we need to make is to actually
specialize this accept
statement so that it only works on the certain type of error that would
happen if someone
entered a string into this. So that type of error is a value error. So we'll
do that here.
And actually not to look at this, this is a bit unnecessary with two
separate print
statements. So I'm going to copy this here and do in newline character and
try that.
Okay, so let's run this and see if it defines properly. Awesome, it does. So
we're going
to use this portfolio size function to accept a portfolio size now and then
we're going
to use it again later to accept a portfolio size for our better and more
realistic momentum
strategy. But we're going to make sure this works. And actually, before we
proceed, one
thing that's worth noting is that this should function should be called
portfolio input.
And it doesn't actually have to be called portfolio input. But it definitely
should
not be called portfolio size, because then it has the same name as this
variable. And
that's kind of an easy way to introduce logical bugs in your code is by
having a function
and a variable with this same name. So we're going to change this to
portfolio input, and
we're going to run this code cell to define it, then we're going to test it
by running
portfolio underscore input. And then we are going to print portfolio
underscore size.
Now, here, it's especially important to print the variable that's defined
inside the function
because we need to make sure that this, this global keyword is working
properly. So let's
run this and see what happens. So another size of your portfolio will enter
$1,000,
and it should print $1,000. Awesome. Now let's try this again, we'll put in
a string. So
I will say, my portfolio is too small to matter to you. That's kind of sad,
but we're gonna
try it and then it says that is not a number, please try again. Okay, so my
portfolio is
one. Awesome, so our portfolio input function is working properly, we can
now move on to
looping through our panda's data frame and calculating the number of shares
to buy. So
the first thing we need to do is actually calculate our position size. And
to do that,
I'm going to actually enter a slightly larger portfolio. So we're gonna say,
That's 1000.
That's a million. Let's do 10 million. And what we need to do is calculate
position size.
So to calculate, to calculate position size, we're going to do position size
is equal to
float portfolio size, divided by the length of our panda's data frame. So we
'll say length
of final data frame, and then you can pick the length of any column, I
usually do dot
index for this. So this gives us let's print it out to see what the position
size is. This
gives us is that 200,000? Yep, yep, 200,000. And that makes sense, because
we're just doing
10 million divided by 50. So that kind of math checks out. And then the next
thing to
do is we need to loop through all of the rows in the panel data frame. So
for that, we're
gonna say for i in range, zero, and then the length of the data frame, final
data frame.
And then let's just print it to see what happens. So this goes from zero to
49. And if we go
back up to a panda's data frame, we can see that the last index is 49. And
the first index
is zero. So that's awesome. That looks like it should work properly. Let's
take away this
print statement now. And we're going to loop through all of the entries
within the number
of shares to buy column. Now, the way to do that is with pandas LLC
functionality, we're
gonna pass in I for the row, and we're going to pass in number of shares to
buy for the
colum
1 (共1页)
进入HuNan版参与讨论