Smith Python Blitz. No More Guessing .pdf
Original filename: Smith - Python Blitz. No More Guessing.pdf
Title: Python Blitz: No More Guessing
Author: James Smith
This PDF 1.4 document has been generated by calibre 3.23.0 [https://calibre-ebook.com], and has been sent on pdf-archive.com on 11/01/2019 at 20:52, from IP address 217.170.x.x.
The current document download page has been viewed 17 times.
File size: 5.4 MB (79 pages).
Privacy: public file
Download original PDF file
Python Blitz Setup
Tackling a Difficult Page
The Merge Tools
The Combine Tools
Testing, Testing, Testing
Beginning the Season
Predicting one Match-Up
Predicting Per Week
Measuring Your Success
A Few Coding Projects
What is being said about
“I loved the way Jim explained many of the steps outlined in this book with humor and wit. And, I learned some neat tricks as well!”
Been following James a few weeks now and I have to say his predictions on many college basketball games have been spot on. The tools he uses definitely produce results. Anyone would be wise to
pick up Python Madness.
"Just got my copy of #PythonMadness. Looking forward to sharpening the #Python skills. Thanks putting a
product out there that makes it fun for aspiring programmers to learn code."
-Tomek Sulkowski @Tomek_ATL
"I have known this guy a very long time and he is both a superb coder and very empathetic. If you want to
learn Python, buy this book asap before all the other kool kidz do...."
Also by Jim Smith
Introduction to Python Blitz
I am a College Football nut, Roll Tide! I love to do ESPN's College Pick'em
page with a group of friends. It is fun during the football season to compete with
others and especially if your favorite team does well. OK, Now I will describe
some of the "stuff" that happens during the season.
Week one, I will get on bunches of sports pages listening to the popular
talking heads and their predictions. I will then take their consensus choices and
fill out my ten picks. Then I will order my picks carefully based on how sure the
sports pundits were when they imparted their wisdom. I eagerly wait for the
matches... And ... I do, OK. After all, during week one, the big teams are playing
cream puffs. Next week will be better...
Week two. I decide to get on the Vegas odds pages and take the line or the
spread and use these numbers to pick my teams. The results are still not as good
as I want... After all, Vegas just wants to make money for themselves.
I have tried picking the teams with blue football fields, nope. Teams with a
good kicker, nope. Teams with the best looking uniforms, nope. Teams having
the most NCAA recruiting violations, nope.
Being a computer programmer of 30+ years and wanting a better way, I have
created a set of Python tools to automate the prediction process. Now once a
week you can easily and automatically create a spreadsheet to pick your teams!
How cool is that!
To get my prediction data I scrape publicly available college football statistics
pages. The computer language I use is free and open source Python. My tools
will run on Windows or on Linux.
So if you want to have fun, not spend lots of money, and teach yourself some
programming, get my book, download my provided open source software and
beat your buddies picking the winners each week. The software is yours to use,
and to change or modify as you wish. Knock yourself out and have fun! Enjoy
Copyright © 2018 Jim Smith
All rights Reserved
Independently Published in the USA
I dedicate this one to my wife, LuAnne, we have spent many nail
biting games shouting at the TV; and the blind Refs. College football
is in her comfort zone. I love you, Hon.
Chapter 1 – Windows Setup - 1
Chapter 2 – Linux Setup - 9
Chapter 3 – Python Blitz Setup... - 15
Chapter 4 - The Scraper Tools - 23
Chapter 5 - Tackling a Difficult Page – 29
Chapter 6 - The Merge Tools– 37
Chapter 7 - The Combine Tools – 39
Chapter 8 - Testing, Testing, Testing – 43
Chapter 9 - Beginning the Season – 47
Chapter 10 - Predicting one Match up – 55
Chapter 11 - Predicting per Week – 59
Chapter 12 - Measuring your Success – 67
Chapter 13 - A few Coding Projects – 69
Chapter 14 - My Equipment – 73
Chapter 15 - Closing Words – 77
“We can’t run. We can’t pass. We can’t stop the run. We can’t stop the pass. We
can’t kick. Other than that, we’re just not a very good football team right now.”
My Windows system is currently running Microsoft Windows 10 Professional.
The current build number is 10.0.16299. The shell I run is Power shell version
Installing Python version 3
(the programming language)
In order to run my software you need to install Python version 3. Don't worry,
the software is free and open source. It is also easy to install. Hop over to the
page https://www.python.org. and make sure to download the correct version.
The Python Blitz software requires version 3.x. Install the 64bit version.
There is an option in the install kit to also install pip, make sure to choose this
option. Because the Python language is very modular you can use pip to install
additional Python modules required by my software and not installed with the
Installing Libre Office
(the spreadsheet you will use)
When my software runs it will create spreadsheet files. In order to analyze the
sports picks, a spreadsheet has a lot of power. Windows has a spreadsheet called
Excel, however, the price for me is too high. LibreOffice is my solution. It is free
and open source.
Go to the following page https://www.libreoffice.org. and download this
Installing Microsoft Build Tools
Now, we need a free Microsoft package in order to run the Python Blitz
This software is a C compiler and other low level tools. Once they are
installed the Python Blitz software will simply run faster and better.
Install Git (source control tools)
In the case of Linux, Git just gets installed... in the case of a windows system
you need a freaking PhD?! Just picking the standard defaults is OK, but I have
also found this page: https://hackernoon.com/install-git-on-windows9acf2a1944f0. for some explanation of the settings and for suggestions on what
to pick. Also Git will require some additional setup after installation, this is
spelled out on this page: https://git-scm.com/book/en/v2/Getting-Started-FirstTime-Git-Setup.
My Python Blitz software uses no symbolic links, so not checking this option
is just fine.
Power Shell Settings
To run my Python Blitz tools you need to either run from the MS-DOS
command prompt, or from the windows Power Shell Prompt. Go to start and
search for either cmd (MS-DOS prompt) or for Power Shell.
I suggest the Power Shell; this is Windows newer shell and it has a lot going
for it. In order to be able to run software from the shell, you will need to make
some settings changes. First we need to launch the power shell prompt as an
administrator, (right mouse on the prompt and pick run as administrator). Now
type in the following command:
If this shows as Restricted, which is the default, then you need to type in this
and answer yes to the prompt. Now type in get-executionpolicy again and
make sure it says RemoteSigned. Finally, get out of Admin mode.
One more change I suggest is to add an alias to your power shell startup
scripts file so the Python Blitz tools will be easier to run. We will create an alias
named "prp" to replace having to type in "pipenv run python" every time we run
the Blitz tools.
To add the alias, use an editor and edit the following file called:
MicrosoftPowerShell_profile.ps1 located in your user Documents folder in the
Add the following two lines to your profile, save and get out of power shell
and back in. You now have an easy way to launch your Python Blitz scripts.
OK, your windows system is now setup and ready to run the Python Blitz
software. You can skip down to the Python Blitz setup chapter.
My primary system is my trusty Thinkpad P50 running Ubuntu 17.10, the Artful
Aardvark release. Compared to Windows, it is much quicker and easier to get
Ubuntu running my Python Blitz software. This chapter will make sure all the
prerequisites are there and running.
Installing Python3 (if necessary)
First we need to determine if the Python3 package is already installed. Open a
terminal shell and type Python3, and if you see this
Then it is already installed. If you do not see this then to install, type in the
sudo apt update
sudo apt install python3.6
To see if it is installed, type in the following:
If Pip is missing, then to install it type in the following:
sudo apt-get install python-pip
sudo apt-get install python3-pip
I suggest installing both Pips even though Pip3 is the one we will use for
the Python3 version 3 package.
Installing LibreOffice (if necessary)
Ubuntu version 17.10 already comes with LibreOffice installed, so let's make
sure it is: Click on the Activities Menu and in the search box enter "Libre"; if
you see the following icons, then, you, are good!
If you do not see the spreadsheet icon, then type in the following in a terminal
sudo add-apt-repository ppa:libreoffice/ppa
then simply launch Software Updater and upgrade the office suite by installing
Open a terminal window and do this:
sudo apt-get install git
Installing Zsh (my favorite shell)
sudo apt-get update && sudo apt-get -y install zsh
OK, now that it is installed we need to make it your default shell. In order to
do this type in the following:
chsh -s /bin/zsh
OK, now get out of your shell, logout, log back in and then re-launch your
terminal shell. It will detect now that you are using Zsh and give you some
defaulting options; pick option 2, here.
We also need to add a few changes to the .zshrc file. To better support Git, I
suggest changing to the clint prompt, it will give you some Git tool goodies (like
what branch you are currently in).
Another mod to the .zshrc file is to create an alias so that the pipenv is easier
to run. I don't know about you, but typing "pipenv run python" seems a bit
excessive; I prefer running "prp" instead.
This is what my favorite shell looks like. Pretty cool, huh!
In order to properly consider yourself a coder, you need a version control
system. The Git software does this, but you also need a reliable way to keep
track of and back up your software. I keep my software up in the clouds on
Github. Github is a free web page that stores your software on their servers.
As I develop software and change my code I will quickly push it up to the
Github server. Then, if something bad should happen to my laptop, I can be
assured that my code is somewhere safely stored.
OK, you need to get yourself a Github account. Don't worry, it is free and they
are good. Lots and lots of coders besides myself trust them. Go to
https://github.com/. There is a registration page, they want a username, an e-mail
address and a password. Please sign up; I'll wait.
OK, signed up and Logged in? Good! Now let’s get the Python Madness
Software. Please go to my repository located on Github at this location:
https://github.com/meprogrammerguy/pyBlitz. Remember, you are logged in to
Github but you are now at my Github pyBlitz repository. Now go to the top right
of the page and press the "Fork" button. After, doing this, Github will think and
then jump you back to your page, but you will have a "Fork" of my repo also
called pyBlitz. This is your own personal copy of the Python Blitz software,
stored up in the cloud.
What we want to do now is to bring this software down to your PC so you can
install and run it, and if you wish; make your own changes to the code. To do
this we will clone your repository down to your PC. First, I suggest making a Git
directory and setting yourself into it.
You can now click on the Github "Clone" button, doing this will save off the
long text you will need to type in. To clone type:
git clone https://github.com/<your github user name>/pyBlitz.git
After this command completes your repo will be located in git/pyBlitz. Set
yourself into the directory and do a dir; you should see this.
OK, now there are several Python modules that my code requires to properly
run, we need to load them in now. There are two main ways to do this, you can
globally load them onto your PC, which isn't the best way. Or we can use a
package tool called Pipenv to locally load the modules just for your project.
First we install Pipenv using the Pip tool like this.
sudo pip3 install pipenv
(for windows leave off the sudo)
Then from your pyBlitz directory run this command
sudo pipenv install –three
(for windows leave of the sudo)
This will create an environment with all the proper modules loaded so that the
Python Blitz software can run.
In the case of a Linux system you may need to give permission to run, to all of
the files having a .py extension. To do this open up your file manager and right
mouse; and pick the properties option, then go to the permissions tab and check
the “allow to execute box”. Do this for all of the .py files in the pyBlitz directory.
Let's quickly test if the Python Blitz software is set up. Type in the following:
prp score_matchup.py --test --verbose
If you see test passes after the code runs then you are good! Now, let’s learn
In order to be able to make my calculations, I need information and statistics,
fortunately; the web is chock full of tasty goodness. In order for my code to use
this information, I have written several scrapers to store the information I
require, locally in files.
A scraper is a Python script that can be run to obtain the page information and
manipulate and store it in files on your PC. Because throughout the year the
information on these pages change, the scrapers will need to be run right before
the game match up is predicted.
The Abbreviations Page
I found a team abbreviations page to solve a problem with the ESPN team
schedule page. When a team wins, then the page will show the home team
abbreviation, their score, a comma, and then the losing team abbreviation
followed by its score. This is stupid and illogical; also, you do not know the
abbreviation until the game has played.
Luckily, the abbreviations do not change, and I found a page having them.
This page is pretty accurate, although a few of the abbreviations do not match
up. Because the page I scrape is from about 2014, a few abbreviations are
missing. I have added two missing teams in the code.
Using Beautiful Soup to parse the page, it turns out the second table is the one
I want. To determine this you can right mouse on the page in your browser and
do, examine page source, then do a ^F (find) of the tag "table".
After grabbing the data, I need to store it in a useful way on my PC. I decided
to store the file in two different formats. A JSON file because it is better to
manipulate this in code. I also store the data as a CSV file so I can look at it in a
spreadsheet if I wish.
A machine way (JSON) and a human way (CSV, spreadsheet).
The Betting Talk Page
After having the team spread figured out, I wanted a way to take the spread
and figure out a way to assign percentages to my prediction. This is a statistical
look at college football and assigning a win percent.
This is a great page showing The NFL, College basketball, and for us, college
football. For the betting talk scraper, the table we want has the Id: “tablepress23”. The way to determine this is right mouse in your browser and pick show
page source. I also take the table and convert it to a JSON file and a spreadsheet
The Outsiders Page
Early in the process of creating the Python Blitz software, I found a popular
statistics page called outsiders. I created a scraper and tried using this page in my
calculations. However, during testing, I did not care for the numbers I was
getting. I am not currently using this page in my calculations. I left behind the
scraper script and if you wish to play with it or use it feel free.
Getting The Yearly Schedule
The goal of my tool is to be able to automatically predict all the games in a
year including the bowl games. ESPN has a good page for this, so I created a
scraper called scrape_schedule, to grab every week and the bowl games into
weekly schedule spreadsheets. All of the weeks are already figured out and only
the bowls are yet to be determined. You will need to do a scrape at the beginning
of the year and also after the bowls are determined at the end of the year.
There are a few pieces of info that this scraper does that is pretty cool. It
determines which team is the home team or if the field is neutral. Based on the
statistics collected over many years has been measured that the home time has
about a one touchdown advantage in figuring out who wins. It also detects if the
match ups occur in January and corrects the year in the spreadsheet it builds. The
January match ups, are the final few bowl games at the "end" of last year.
The Team Rankings Page
The team rankings page has tons of statistics about college football. I took a
look and found several pages that I am using in my calculations. The specific
stats I am interested in are "plays per game", "points per play", "opponent points
per game", and "opponent points per play". I mash all these numbers together
and obtain an average points scored per game. I use this figure along with the
spread and guess at what the final score will be. Also, teams can be "hot" or
"cold" throughout the year, so I use numbers from the teams’ last three games.
This is description of all the scrapers I created except for one, The toughest
one, was quite uncooperative. It was like hunting wolverines in Alaska, because
of this, it has its own chapter.
In the previous chapter I dealt with all the scrapers except one. For my
calculations, I wanted a spread number for each team; this is a way to determine
approximately how much one team will beat another team by. I Googled around
and I found a page I liked. It was called the Born Power Index. It had a lot going
for it, it has been around many years, and it keeps track of lots of college teams.
I initially wrote my scraper like the others, but it did not work. The reason is
because this page is different from the others. In order to get your info you first
need to answer some questions on the first page, and then it will offer up a
different page that has the information you really want.
In order to understand what is happening in the background, a tool is needed,
don't worry though; this tool is usually contained in most browsers. I use chrome
so I'll describe the chrome way of getting the information I need.
Let's start snooping! Open up your browser and go to this page:
Now let's open up the chrome development tools; go to the top right of your
browser and you will see three small lines (called the hamburger menu). Open
this menu up and you see the option called "more tools", click there, ans another
menu will show. Click on "developer tools". Now something real cool happens;
the actual Born Power Index page will get crammed to the left side of the
browser; and on the right a "scary" screen having several tabs will appear. Click
on the network tab.
OK, now go to the left side of your browser and answer the questions
necessary to get the info we want. On the upper left open up the menu and pick
"football", then "College", finally pick "Class", you will now see another page
having some more questions, but this is still not the page we require. The
classification defaults to "1A", this is good, now click the "send the query"
button. Cool, we now see the page we want showing a table of the football
Now, move over to the right side of the browser and stuff has happened. All of
the hidden requests you made in the actual page are there! Put on your detective
hat and let's dig in. You will see a lot of columns, each column corresponds to
browser activity from the left. Remember, we are only interested in the last page
that happened. Go to the bottom and click on the DBretrieve.pl item. Now,
further to the right is the information we need to get that pesky table.
Open up the item called "Request Header"; this information is the information
we send across that is normally not seen. If you look in the code in the file
scrape_bornpowerindex, you will see a header structure where we are going to to
send this info across. You can just copy and past the info and then put it in a
Python dictionary structure like I have done.
The second item we need is called the form data. Go to the right and close the
"Request Header" and open up the one called "form data". Notice there are three
things here, "getClassName", "class" and "sort". Once again, we can copy these
into a Python dictionary structure, like I have done in the code. Because I want
all teams and not just the "1A" teams you see six different form data structures in
the code to retrieve all of the info I want.
The final Item we need is called the "Request URL"; this is the page we really
want, that has the table of teams. Go to the right and look at this info. To see this
URL, you need to open up the "general" section and it will be the first line. The
hidden page we really want is: http://www.bornpowerindex.com/cgibin/DBRetrieve.pl.
OK, we now have everything we need to just correctly ask for the exact pages
we want. We use Beautiful Soup to parse the data and we create two files, a
JSON file that the python code will better understand and a spreadsheet file that
we can pull up having our information.
As Hagrid says: "You're a wizard, Harry!"
There is a common general programming problem, how to match up data and
information from different places; in the Python Blitz code, data comes from two
different statistics pages, an abbreviation page, a yearly schedule of teams page,
a unicorn page … Maybe, not that page.
In order to do less work to match up these pages, I use a concept called fuzzy
string matching. I have added a Python module called Fuzzy Wuzzy, which uses
pattern matching to make similar strings match up. An example could be W.
Tongo, West Tongo, and Western Tongo could be the same school.
Because the match ups of teams is a bit tedious, I have done much of the
heavy-lifting, and when you downloaded the Python Blitz code, you also
downloaded my default merge spreadsheets; however, in case you want to build
these from scratch, just follow along.
Running the merge_abbreviation script will create a spreadsheet which looks
Notice, I am showing the highlighting of the "match ratio" column. This is a
calculation of how likely the teams match; 100% means spot on and the lower
the percentage, the more you need to be concerned. Go to the spreadsheet and
highlight this column, then go to the data menu and sort the column in ascending
order. The spreadsheet will pop up the following dialog:
Just pick the extend selection. Now your spreadsheet is sorted from low to
high percentage of matches. Look over the teams and the abbreviations and
make sure that the data matches. For example: make sure that Florida is not
mistaken for Florida State. Make any corrections in the correction columns and
save the spreadsheet.
There is a merge_stats and a merge_schedule script as well, the merge
spreadsheets are stored in the /data sub-directory. These spreadsheets can be
updated to make sure that when the predictions are made that the data being used
When running the merge scripts, the script will open up the spreadsheet if it
exists to obtain defaults. If you want to start from scratch, then you can delete
the spreadsheet, before running the merge.
I have described the scraping scripts and the merge scripts. There are two more
scripts I need to mention; I call them combine scripts. These scripts can be
thought of as data manipulation scripts. In order for the prediction scripts to run,
the scraped data is placed in files to quickly be used.
The Merge Master file
(everything in one place)
The first combine script does some heavy lifting; it combines all of the pages
which were scraped into one file called merge. This file can be thought of as the
key file; the primary key is the Born Power Index team names. The other keys
are used to cross reference to the other pages used in the prediction routines.
The Stats File
(two pages brought together)
The other combine tool is called combine_stats.py. This script takes the born
power index spread and combines it with the three game data from the popular
page called the Team Rankings page. The team rankings page has about every
statistic that you could want. The information I am using are the points each
team scores over the last three game appearances. I wanted a way to guess at
how many points each team will score. The points scored over the last three
games stat is a way to know whether a team is currently hot or cold during this
part of the season. The stats spreadsheet looks like this:
Testing is very important in coding; after all, if your system does not provide
accurate answers, then no one will want to use it. With Python Blitz I have tried
to take the great volume of information available on line, and tried to distill it
down to something useful.
If you run either the score_matchup or the score_weekly scripts and pass a --
test argument the code will provide data to the calculation routines found in the
pyBlitz module and return a pass or fail score. This becomes valuable if you
have modified the pyBlitz routines and start seeing failures. When this happens
you know that a possible mistake in the code has caused answers that no longer
match the test cases.
The second method of testing is a sub directory of testing scripts located in
test/win or test/linux depending on you operating system. These scripts exercise
the score_matchup code by providing the proper arguments for team match ups.
In the scripts; I also show the setting up of the prp alias to make the running of
the script quicker and easier.
Finally, I have also written five Python test scripts whose only purpose is to
make sure that your spreadsheets are set up correctly at the beginning of each
season, so that when each prediction week is run, the numbers can be as good as
possible. For example, you do not want to accidentally mix up two similarly
named teams like Florida and Florida State.
The main test script is named test.py, and this script is the main test script tool;
it will run all of the other test scripts for you. This I suggest running once a year
at the start of each season. This will be further described in the "Beginning the
To run the tests, go to the terminal and type in this:
In the following image, you will see that there are missing teams. To correct
this, you need to edit the merge spreadsheets until these warnings no longer
appear. This will be further described in the "Beginning the Season" chapter.