Skip to content

Commit b3d479f

Browse files
committed
Added license, doc cleanup, and updated the arg parser cli to work
1 parent 346e472 commit b3d479f

File tree

5 files changed

+755
-24
lines changed

5 files changed

+755
-24
lines changed

.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
__pycache__
22
env
33
input
4-
output
4+
output
5+
example

README.md

+66-7
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# CSS Extractor
22

3+
The handiest script for UI development. Copy your favorite elements from the internet and get back `HTML` and `CSS` files that you can use and modify right away. No more manually copying and pasting elements from stylesheets.
4+
35
## Installation
46

57
1. Create a Python Virtual Environment
6-
You can use `conda` or something else, this is the fastest for me.
8+
You can use `conda` or something else, but this is the fastest for me.
79
```
810
python -m venv /path/to/new/virtual/environment
911
```
@@ -14,25 +16,82 @@ Make sure to activate your virtual environment before running!
1416
pip3 install -r requirements.txt
1517
```
1618

19+
## Usage
20+
21+
The extraction process is a 2-step process.
22+
23+
1. HTML Extraction
24+
We first analyze the input HTML markup and create a map of old classes to new ones.
25+
26+
For example, if you have an HTML element like the following:
27+
```html
28+
<h1 class="big red slant">This is a header</h1>
29+
```
30+
31+
We would generate a mapping that is JSON-encoded like this:
32+
```json
33+
{
34+
"newClass": "RANDOM_STRING_CLASS_NAME",
35+
"oldClasses": "big, red, slant"
36+
}
37+
```
38+
39+
And produce an output HTML file like this:
40+
```html
41+
<h1 class="RANDOM_STRING_CLASS_NAME">This is a header</h1>
42+
```
43+
44+
2. CSS Extraction
45+
The JSON file generated in the previous step is then fed into the CSS extraction process, which builds our new class with all the styles from the old classes. The output class will have the old class names annotated in the CSS as comments for debugging purposes.
46+
```css
47+
.RANDOM_STRING_CLASS_NAME {
48+
/* big */
49+
font-size: 50px;
50+
/* red */
51+
color: red;
52+
/* slant */
53+
font-style: italic;
54+
}
55+
```
1756

18-
## Useage
57+
> We keep this as a manual 2-step process to allow for debugging after whole process is completed. We may change this to just be a 1 step process in the future.
1958
2059
### HTML CSS Class Extraction
2160

61+
This is an example of how to run the extraction:
62+
63+
```bash
64+
python3 main.py --html --input-html-file input.html --output-html-file output.html --output-json-file output.json
65+
```
66+
2267
### CSS Class Extraction and Compression
2368

69+
```bash
70+
python3 main.py --css --input-css-file input.css --input-json-file input.json --output-css-file output.css
71+
```
72+
2473
## Why?
2574

26-
I like building websites with raw CSS but building UI's with it can be hard and annoying. I'm also not the most creative and it's a pain to write, so I end up copying elements or sections of content that I see on the internet.
75+
I enjoy building websites with raw CSS, but creating UIs with it can be difficult and frustrating. I'm also not the most creative person, and writing CSS from scratch can be tedious, so I often end up copying elements or sections of content that I find on the internet.
2776

28-
After doing years of manual extraction of sites styles that I like, I decied to make a library that could do this for me. This library has saved me hours of manually reading through markup files and allows me to experiement much quicker.
77+
After years of manually extracting styles from websites I liked, I decided to create a library to automate this process. This library has saved me hours of manually reading through markup files and allows me to experiment much faster.
2978

3079
## How does it work?
3180

32-
The first interation used regex to find and grab the css defintions from our desired input classes. This method actually turned out to be a bad approach because css is complicated expescially when you're dealing with @media queries and psudo selectors. There are some leftover functions that use regex matching, since I didn't bother to rewrite.
81+
The first iteration used regex to find and grab the CSS definitions from the desired input classes. However, this approach turned out to be problematic because CSS is complex, especially when you're dealing with @media queries and pseudo-selectors. There are still some leftover functions that use regex matching, as I didn't bother to rewrite them.
3382

34-
I ended up picking up the `tinycss` library that handles parsing text into into a way that it can be programatically analyzed. This is way better to used with @media handlers and psudo selectors. This approach is a bit slow since we're doing this in a bunch of nested loops and the library is not built for speed. Once we iron out edge cases we can look at how we can improve the speed of the program.
83+
I eventually started using the `tinycss2` library, which parses CSS text into a format that can be programmatically analyzed. This is much better for handling @media queries and pseudo-selectors. This approach is a bit slow, as it involves many nested loops, and the library is not optimized for speed. Once we resolve the edge cases, we can focus on improving the program's performance.
3584

3685
## Caveats
3786

38-
This is a very basic implementation. This has gotten me pretty far. There are cetain edge cases that the library does not catch. There are instances of classes that are nested (A child element within a parent element with a certain class). Not selectors are tricky and not caught. Feel free to add a issue for these bugs you find. If it's feasable we'll fix it.
87+
- This is a very basic implementation, but it has served me well so far. There are certain edge cases that the library does not handle. For example, it struggles with nested classes (a child element within a parent element with a specific class). This nested class example will break the script.
88+
89+
- "Not" selectors and complex pseudo-selectors are also tricky and not detected. Feel free to open an issue for any bugs you encounter. If it's feasible, we'll fix them.
90+
91+
- The formatting of the output files is a bit janky. In the future, we can clean this up, but for now, it's easy enough to reformat CSS and HTML in a code editor or IDE.
92+
93+
- The tool currently only works with single files. If the CSS is split across multiple files on a website, you will have to manually combine them for the `input.css`. Similarly, if CSS is embedded in a style tag within the HTML, you'll need to extract that and place it in a separate file.
94+
95+
- Styling and look is heavily affected by the css reset's applied. Make sure that you're using a similar reset file. For example, if your cloning tailwind css, make sure to include the tailwind css reset. Other styling frameworks should work the same.
96+
97+
- If a class is not found in the input stylesheet, the library makes a best note to mark which class could not be found. this appears right after the css class defination as a comment.

html_parser.py

+5-7
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
import string
44
import json
55

6-
def parse_html():
7-
with open('input/input.html', 'r') as f:
6+
def parse_html(input_html_file: str, output_json_file: str, output_html_file: str):
7+
with open(input_html_file, 'r') as f:
88
webpage = f.read()
99
soup = BeautifulSoup(webpage, features="html.parser")
1010
classes_map = {}
@@ -27,10 +27,8 @@ def parse_html():
2727
if node.has_attr('class'):
2828
break
2929

30-
with open("output/output.html", "w") as file:
30+
with open(output_html_file, "w") as file:
3131
file.write(str(soup))
3232

33-
with open('input/input.json', 'w') as fout:
34-
json.dump(new_classes_list, fout)
35-
36-
parse_html()
33+
with open(output_json_file, 'w') as fout:
34+
json.dump(new_classes_list, fout)

0 commit comments

Comments
 (0)