Info on some python packages.
- numpy
- pandas
- scipy
- networkx
- BeautifulSoup4
- conda
- More packages
numpy
- Create array:
np.array([(1.5,2,3), (4,5,6)], dtype = float)
- Shape:
arr.shape
- 1-D arrays can be made from list generator:
np.fromiter(list_gen)
(UNVERIFIED) - Infinity:
np.inf
- Identity matrix:
np.identity(n)
(wheren
is number of rows)- For
int
elements:np.identity(3, dtype=int)
- For
- Add two matrices:
np.add(m1, m2)
or justm1 + m2
- Broadcasting: https://numpy.org/devdocs/user/basics.broadcasting.html
- Default
dtype
isfloat
- Create empty (ie, with garbage values) matrix:
np.empty(<shape>)
- Create array full of constant values:
np.full(<shape>, val)
np.dot
andmatmul
(@
) aren't the same- np.resize and var.resize behaves slightly different, apparently
- np.newaxis is actually an alias for
None
?? - Create (main) diagonal matrix:
np.eye(size)
np.argmax()
??
Assert that two values are equal
assert np.array_equal(a, b)
Norm of a vector
https://numpy.org/devdocs/reference/generated/numpy.linalg.norm.html
Use np.linalg.norm(v)
- Many different forms of norm possible
- Default norm is Frobenius norm (aka magnitude)
- Take absolute values of squares of all elements
- Sum it up
- Take the square root of the whole value
>>> a
array([1, 2, 3])
>>> np.linalg.norm(a)
np.float64(3.7416573867739413)
# (1+4+9)^½
Kronecker product
- aka tensor product.
- Use
np.kron(a, b)
- Make copies of
b
at every location ofa
where value is1
. Kinda.
>>> i22 = np.identity(2)
>>> mul = np.array([[1, 1, 1, 0],
0, 0, 0, 1]])
[
>>> np.kron(i22, mul)
1, 1, 1, 0, 0, 0, 0, 0],
array([[0, 0, 0, 1, 0, 0, 0, 0],
[
0, 0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 1]]) [
Jargon
- Rank of shape: Number of elements in
shape
- C-order: ??
Axis parameter
https://stackoverflow.com/questions/48200911/very-basic-numpy-array-dimension-visualization
- nth axis => nth index
- 'order of indexing into the array'
- In a 2D matrix,
- axis=0 => rows
- axis=1 => column
Stacking
np.vstack
: stack two matrices one below the othernp.hstack
: stack two matrices side by side
>>> a = np.array([[1,0],[2,0],[3,0]])
>>> a
array([[1, 0],
[2, 0],
[3, 0]])
>>> np.hstack([a,a])
array([[1, 0, 1, 0],
[2, 0, 2, 0],
[3, 0, 3, 0]])
Matrix creation
Create a vector/matrix of all zeros
>>> np.zeros(3)
0., 0., 0.])
array([
>>> np.zeros(3).shape
3,)
(
>>> np.zeros(3).reshape(1,3)
0., 0., 0.]])
array([[
>>> np.zeros(3).reshape(1,3).shape
1, 3)
(
###
>>> np.zeros((1,3))
0., 0., 0.]])
array([[
>>> np.zeros((1,3)).shape
1, 3) (
https://numpy.org/doc/stable/reference/generated/numpy.zeros.html
Make a matrix by giving all values explicitly
>>> np.array([[0, 1], [0, 0]])
array([[0, 1],
[0, 0]])
Create a matrix out of a constant
>>> np.full((2,2), 3)
3, 3],
array([[3, 3]])
[
>>> np.full((2,2), np.inf)
array([[inf, inf],
[inf, inf]])
>>> np.full((3,2), [1,2])
1, 2],
array([[1, 2],
[1, 2]]) [
https://numpy.org/doc/stable/reference/generated/numpy.full.html
Create a matrix from a python list
>>> a = np.array([1,2,3])
1, 2, 3])
array([>>> a.shape
3,) (
Matrix multiplication
- Multipy vector with matrix:
np.dot
- Concatenate two vectors:
np.concatenate([v1, v2])
- Arguments must be given as a list
Determinant of a matrix
Remember that determinant is not defined for non-square matrices.
>>> import numpy as np
>>> np.linalg.det(np.array([[1,2],
... [3,4]]))
...
-2.0000000000000004
https://numpy.org/doc/stable/reference/generated/numpy.linalg.det.html
Eigen values of a matrix
>>> np.linalg.eigvals(np.array([[1,2],
... [3,4]]))
...
array([-0.37228132, 5.37228132])
# Product of Eigen values is determinant
>>> np.prod(np.linalg.eigvals(np.array([[1,2],[3,4]])))
-1.9999999999999998
https://numpy.org/doc/stable/reference/generated/numpy.linalg.eigvals.html
Reading from csv
- np.loadtxt: usable when there are no missing values
- np.genfromtxt: usable even when there are missing values
- https://numpy.org/doc/stable/user/how-to-io.html
Doubts
- np.resize vs np.pad
- np.broadcastto
Misc
- Pacakges for type hints:
Theory
- Matrix addition is commutative, but multiplication isn't
Pandas
Convert a 2D frame to a 1D frame (ie, DataFrame to Series):
df.stack()
Read from csv with first column as row index:
pd.read_csv("~/b.csv", index_col=0)
Write csv:
df.to_csv("~/out.csv")
Index using number instead of names:
iloc[0]
Column names:
df.columns
Row names:
df.index
Use
pd.read_csv()
Convert a
Series
tonp.array
:to_numpy()
level??
axis
- 0: stacking one on top of another
Merge two Series
to make a DataFrame
>>> pd.concat([s1, s2], axis=1)
>>> pd.concat([s1, s2], axis=1).columns
RangeIndex(start=0, stop=2, step=1)
Examples
# Read from a csv file
>>> df = pd.read_csv("name.csv")
# Columns
>>> df.columns
Index(['Site Type', 'Used', 'Fixed', 'Prohibited', 'Available', 'Util%'], dtype='object')
# Number of rows
>>> a.index
RangeIndex(start=0, stop=16, step=1)
scipy
Sparse matrix
https://docs.scipy.org/doc/scipy/reference/sparse.html
>>> import scipy
>>> import numpy as np
>>> X = scipy.sparse.csr_matrix(1./2.*np.array([[0.,1.],[1.,0.]]))
>>> X
<2x2 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> print(X)
0, 1) 0.5
(1, 0) 0.5 (
Curve fitting
Find an equation corresponding a function made from a set of data points.
networkx
Graph attributes
- Adjacency matrix:
nx.adjacency_matrix(G)
- Attribute matrix:
nx.attr_matrix(G, <attr-name>)
- Eg:
nx.attr_matrix(G, "weight")
- Eg:
Add nodes
- Insert single node: G.addnode()
- Insert nodes from a list: G.addnodesfrom()
Add edges
Adding edges implicitly adds nodes.
Edges from/to on a node in undirected graphs:
G.edges(<node>)
Edges incident on a node in directed graphs:
G.in_edges(<node>)
Edges outgoing from a node in directed graphs:
G.out_edges(<node>)
Check if an edge exists:
DiGraph.has_edge(n1, n2)
Add edge with weight:
G.add_edge(from, to, weight=<weight>)
Node attributes:
G[<node>]
orG.nodes[<node>]
Edge attributes:
G[n1][n2]['<attr-name>'] = <attr-val>
Get edge data:
G.get_edge_data(n1, n2)
BeautifulSoup4
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
- Install: pip3 install BeautifulSoup4
import pathlib
from bs4 import BeautifulSoup
= pathlib.Path("/home/user/Downloads/input.html")
html_path = html_path.read_text()
htmlstr = BeautifulSoup(htmlstr)
soup = soup.find(id='content-container') content
Finding
- Find all elements matching class:
soup.find_all("tagname", class_="classname")
- Find by id:
soup.find(id='content-container')
Children
- tag.contents: immediate sub-tags as a list
- tag.children: immediate sub-tags as an iterator
- tag.descendants: all sub-tags as generator
conda
- Create new conda environment:
conda create -n <env-name> [python=3.6.8] [packages needed]
- Activate conda environment:
conda activate <env-name>
- Install pip to the environment:
conda install pip
- conda envs doesn't come with pip by default. Gotta install them.
- Install git:
conda install git
- Delete a conda environment:
conda env remove -n <env-name>
- Delete a package installed with conda (not with pip):
conda remove <package-name>
More packages
- bidict: bijective maps (ie, 2-way dictionary)
Tools:
- pylint, flake8: linters
- mypy, pytype: static type checkers
- pytest: unit testing
- Show execution time of tests:
pytest durations=0 # show tests that ran for more than 0s
- Show execution time of tests: